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DESCRIPTION 

METHOD OF TARGETED GENE DISRUPTION GENOME OF 
HYPERTHERMOSTABLE BACTERIUM AND GENOME CHIP USING THE ^SAI^jE 

5 ' 

TECHNICAL FIELD 

The present invention relates to genomics. More 
specifically^ the present invention relates to a genome of 
10 a hyperthermostable bacterium and a genome chip thereof. 
The present invention relates to a novel method for targeted 
disruption. 



BACKGROUND ART 

15 Hyperthermostable bacteria survive in high 

temperature environments, proteins (such as enzymes) 
produced by the bacteria are generally thermostable, i.e., 
structurally stable. Further, archaebacteria, to which the 
hyperthermostable bacteria belong, are living organisms 

20 different from conventionally known prokaryotic or 
eukaryotic organisms. Therefore, it is clear that the 
hyperthermostable bacteria are evolutionally different 
from these organisms. Accordingly, even if an enzyme 
derived from the hyperthermostable bacteria has similar 

25 functions to those already known derived from prokaryotic 
or eukaryotic cells, the enzymes derived from the 
hyperthermostable bacteria are often structurally and/or 
enzymatically different from conventional enzymes. For 
example, chaperonin isolated from the KOD-1 strain 

30 (Thermococcus kodakaraensis KODl, hereinafter also called 
KODl or KODl strain; Morikawa, M. et al . , 

Appl. Environ. Microbiol. 60 (12) ,4559-4566(1994) ) , a 
hyperthermostable bacterium, has similar functions to GroEL 
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from Escherichia coli. However, GroEL forms a 14-mer and 
further complexes with GroES, which forms a 7-mer, in order 
to achieve its functions, whereas the chaperonin from KOD^l 
strain functions alone {Yan,-Z. et al., Appl. Environ. 
5. Microbiol. 63: 785-789). 

Gene disruption using a plasmid is conventionally 
known as a method for targeted disruption of a gene in 
thermostable bacteria (Bartolucci Third International 

Congress on Extremophiles Hamburg, Germany, September 3-7, 

10 2000) . The method of Bartolucci utilizes a homogeneous or 
heterogeneous expression system with a recombinant protein 
using a thermostable bacterium. However, it is unclear as 
to whether targeted genes are definitely disrupted by this 
method, and therefore it cannot be said that effecient 

15 targeted disruption is achieved. 

Accordingly, there is a limitation in gene targeting 
based on information of some of the genes. 

Therefore, it is an object of the invention to provide 
20 a method for gene targeting in an efficient and definite 
manner in an arbitrary site of a genome of a living organism, 
and a kit therefor. 

Further, there is no method as of this date for 
analysing a genome as a whole in an efficient and/or global 
25 manner by the genome of a hyperthermostable bacterium onto 
a chip. Therefore, it is another object of the invention 
to develop a technology for analysing such a genome as a 
whole in an efficient and/or global manner. 

30 SUMMARY OF INVENTION 

The above identified problem has been solved by using 
an entire sequence of a genome of a living organism for 
targeting a portion of chromosomes thereof. In particular. 
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the present invention demonstrates that the above-mentioned 
method has been carried out in an efficient and definite 
manner by sequencing the whole genome of Thermococcus 
kodakaraensis KODl strain^ a strain of thermostable 
5 bacteria, as an example of genomic sequence. 

The present invention also provides for the first time 
a technology for analyzing an entire genome in an efficient 
and/or global manner by sequencing the entire genomic 
10 sequence of Thermococcus kodakaraensis KODl strain, a 
strain of the thermostable bacteria as an example of the 
genomic sequence. Therefore, it is now possible to simulate 
gene expression of the organism per se on a chip. 

15 Accordingly, the present invention provides the 

following: 

1) A method for targeted-disuption of an arbitrary gene in 
the genome of a living organism comprising the steps of: 

A) providing information of the entire sequence of the 
20 genome of the living organism; 

B) selecting at least one arbitrary region of the 
sequence; 

C) providing a vector comprising a sequence 
complementary to the selected region and a marker gene; 

25 D) transforming the living organism with the vector; 

and 

E) placing the living organism in a condition allowing 
homologous recombination . 

30 (2) The method acceding to Item 1 wherein in the step B) , 
the region comprises at least two regions. 



(3) The method accoridng to Item 1, wherein the vector 
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further comprises a promoter. 

(4) The method according to Item 1 further comprising the 
step of detecting an expression product of the marker gene. 

5 

(5) The method according to Item 1 wherein the marker gene 
is located in the selected region. 

(6) The method according to Item 1, wherein the marker is 
10 located outside of the selected region. 

(7) The method according to Item 1, wherein the genome is 
the genome of Thermococcus kodakaraensis KODl . 

15 (8) The method according to Item 1, wherein the genome has 
a sequence set forth in SEQ ID NO: 1 or 1087. 

(9) The method according to Item 1^ wherein the region 
comprises a sequence encoding at least one sequence selected 

20 from the group consisting of SEQ ID NOs : 2-341, 343-722, 
724-1086, 1088-1468, 1470-1837 and 1839-2157. 

(10) A nucleic acid molecule having a sequence set forth 
in SEQ ID NO: 1 or 1087. 

25 

(11) A nucleic acid molecule comprising at least eight 
contiguous nucleic acid sequence of a sequence set forth 
in SEQ ID NO: 1 or 1087. 

30 (12) A nucleic acid molecule comprising a sequence encoding 
an amino acid sequence encoding at least one sequence 
selected from the group consisting of SEQ ID NOs: 2-341, 
343-722, 724-1086, 1088-1468, 1470-1837 and 1839-2157; or 
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a sequence having 70 % homology thereto. 

(13) A nucleic acid molecule wherein when the reading frame 
of Table 2 is f-1, f-2 or f-3, the nucleic acid molecule 

5 has a sequence from the position of nucleic acid number 
(sense strand, start) of SEQ ID NO: 1 of Table 2, to the 
position of nucleic acid number (sense strand, stop) or a 
sequence having at least 70 % homology thereto, or when the 
reading frame of Table 2 is r-1, r-2 or r-3, the nucleic 
10 acid molecule has a a sequence from the position of nucleic 
acid number (antisense strand, start) of SEQ ID NO: 1087 
of Table 2, to the position of nucleic acid number (antisense 
strand, stop) or a sequence having at least 70 % homology 
thereto. 

15 

(14) A polypeptide comprising at least one amino acid 
sequence selected from the group consisting of SEQ ID NO: 
2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157, or a sequence having at least 70 % homology 

20 thereto. 

(15) A polypeptide comprising at least three contiguous 
amino acids of an amino acid sequence selected from the group 
consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 1088-1468, 

25 1470-1837 and 1839-2157, or a sequence having at least 70 % 
homology thereto. 

(16) A polypeptide comprising at least eight contiguous 
amino acids of an amino acid sequence selected from the group 

30 consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 1088-1468, 
1470-1837 and 1839-2157, or a sequence having at least 70 % 
homology thereto. 
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(17) A polypeptide comprising at least three contiguous 
amino acids of an amino acid sequence selected from the group 
consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 1088-1468, 
1470-1837 and 1839-2157, or a sequence having at least 70 % 

5 homology thereto, wherein the polypeptide has biological 
activity. 

(18) The polypeptide according to Item 17, wherein the 
biological activity comprises a function set foth in Table 

10 2. 

(19) A method for screening for a heat resistant protein, 
comprising the steps of: 

A) providing the entire sequence of the genome of a 
15 thermoresistant living organism; 

B) selecting at least one arbitrary region of the 
sequence; 

C) providing a vector comprising a sequence 
complementary to the selected region and a gene encoding 

20 a candidate for the heat resistance protein; 

D) transforming the living organism with the vector; 

E) placing the thermoresistant living organism in a 
condition allowing to cause homologous recombination; 

F) selecting the thermoresistant living organism in 
25 which homologous recombination has occurred; and 

G) assaying to identify the thermoresistant protein . 

(20) A kit for screening for a thermoresistant protein, 
comprising: 

30 A) a thermoresistant living organism; and 

B) a vector comprising a sequence complementary to the 
selected region and a gene encoding a candidate for the 
thermoresistant protein . 
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(21) The kit according to Item 20, further comprising an 
assay system for identifying the thermoresistant protein. 

5 (22) The kit according to Item 20, wherein the 
thermoresistant living organism is hyperthermophilic 
bacteria . 



(23) The kit according to Item 20, wherein the 
10 thermoresistant living organism is Thermococcus 
kodakaraensis KODl . 



(24) A biomolecule chip having at least one nucleic acid 
molecule having at least eight contiguous or non-contiguous 
15 nucleotides of the sequences set forth in SEQ ID NOs: 1 or 
1087^ or a variant thereof located therein. 



(25) The biomolecule chip according to Item 24, wherein the 
nucleic acid molecule or the variant thereof is located to 
20 cover the sequences set forth in SEQ ID NO: 1 or 1087. 



(26) The biomolecule chip according to Item 24, wherein the 
nucleic acid molecule or the variant thereof comprises any 
open reading frame of the sequences set forth in SEQ ID NO: 
25 1 or 1087. 



(27) The biomolecule chip according to Item 24, wherein the 
nucleic acid molecule or the variant thereof comprises 
substantially all open reading frames of the sequences set 
30 forth in SEQ ID NO: 1 or 1087. 



(28) The biomolecule chip according to Item 24, wherein the 
nucleic acid molecule or the variant thereof comprises a 
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sequence encoding at least one sequence selected from the 
group consisting of SEQ ID NOs: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157. 

5 (29) The biomolecule chip according to Item 24, wherein the 
nucleic acid molecule or the variant thereof comprises 
substantially all the sequences encoding sequences selected 
from the group consisting of SEQ ID NOs: 2-341, 343-722, 
724-1086, 1088-1468, 1470-1837 and 1839-2157. 

10 

(30) The biomolecule chip according to Item 24, wherein the 
nucleic acid molecule or the variant thereof comprises at 
least eight contiguous nucleotide lengths of substantially 
all the sequences encoding sequences selected from the group 

15 consisting of SEQ ID NOs: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157. 

(31) The biomolecule chip according to Item 24, wherein the 
nucleic acid molecule or the variant thereof comprises at 

20 least fifteen contiguous nucleotide lengths of 
substantially all the sequences encoding sequences selected 
from the group consisting of SEQ ID NOs: 2-341, 343-722, 
724-1086, 1088-1468, 1470-1837 and 1839-2157. 

25 (32) The biomolecule chip according to Item 24 , wherein the 
nucleic acid molecule or the variant thereof comprises at 
least thirty contiguous nucleotide lengths of substantially 
all the sequences encoding sequences selected from the group 
consisting of SEQ ID NOs: 2-341, 343-722, 724-1086, 

30 1088-1468, 1470-1837 and 1839-2157. 

(33) The biomolecule chip according to Item 24, wherein the 
nucleic acid molecule or the variant thereof, comprises 
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substantially all the sequences encoding sequences selected 
from the group consisting of SEQ ID NOs : 2-341^ 343-722^ 
724-1086, 1088-1468, 1470-1837 and 1839-2157, or sequences 
with one or more amino acid substitution, addition and/or 
5 deletion thereto. 

(34) The biomolecule chip according to Item 24, wherein the 
nucleic acid molecule or the variant thereof, comprises at 
least eight contiguous nucleotide lengths of substantially 

10 all the sequences encoding sequences selected from the group 
consisting of SEQ ID NOs: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157, or sequences with one 
or more amino acid substitution, addition and/or deletion 
thereto. 

15 

(35) The biomolecule chip according to Item 24, wherein 
when the reading frame of Table 2 is f-1, f-2 or f-3, the 
nucleic acid molecule or the variant thereof, has a sequence 
from the position of nucleic acid number (sense strand, 

20 start) of SEQ ID NO: 1 of Table 2, to the position of nucleic 
acid number (sense strand, stop) or a sequence having at 
least 70 % homology thereto, or when the reading frame of 
Table 2 is r-1, r-2 or r-3, the nucleic acid molecule has 
a sequence from the position of nucleic acid number 

25 (antisense strand, start) of SEQ ID NO: 1087 of Table 2, 
to the position of nucleic acid number (antisense strand, 
stop) or a sequence having at least 70 % homology thereto. 

(36) The biomoleculeip to Item 24, wherein the substrate 
30 is addressable. 

(37) A biomolecule chip with a polypeptide or a variant 
thereof, having at least one amino acid sequence selected 
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from the group consisting of SEQ ID NO: 2-341^ 343-722^ 
724-1086, 1088-1468, 1470-1837 and 1839-2157, or a sequence 
having at least 70 % homology thereto, located therein. 

5 (38) The biochip according to Item 37, wherein the 
polypeptide or the variant thereof, has at least three 
contiguous amino acid lenghs of at least one amino acid 
sequence selected from the group consisting of SEQ ID NO: 
2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
10 1839-2157, or a sequence having at least 70 % homology 
thereto, located therein. 

(39) The biochip according to Item 37, wherein the 
polypeptide or the variant thereof, has at least eight 

15 contiguous amino acid lenghs of at least one amino acid 
sequence selected from the group consisting of SEQ ID NO: 
2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157, or a sequence having at least 70 % homology 
thereto, located therein. 

20 

(40) The biochip according to Item 37, wherein the 
polypeptide or the variant thereof, has at least three 
contiguous or non-contiguous amino acid lengths of at least 
an amino acid sequence selected from the group consisting 

25 of SEQIDNO: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 
and 1839-2157, or a sequence having at least 70 % homology 
thereto, and having a biological function, located therein. 

(41) The biomolecule chip according to Item 40, wherein the 
30 biological activity comprises a function set forth in Table 

2. . 

(42) The biomolecule chip according to Item 40, wherein the 
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biological activity comprises epitope activity. 

(43) A recording medium having stored therein information 
of a nucleic acid sequence of a nucleic acid molecule having 

5 at least eight contiguous or non-contiguous nucleotide 
sequences of the sequences set forth in SEQ ID NOs: 1 or 
1087, or a variant thereof. 

(44) The storing medium according to Item 4 3 wherein the 
10 nucleic acid molecule or the variant thereof comprises at 

least eight contiguous nucleotide lengths of substantially 
all the sequences selected from the group consisting of SEQ 
ID NOs: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157, or sequences with one or more amino acid 
15 substitution, addition and/or deletion thereto. 

(45) The storage mediiim according to Item 43, wherein when 
the reading frame of Table 2 is f-1, f-2 or f-3, the nucleic 
acid molecule or the variant thereof has a sequence from 

20 the position of nucleic acid niamber (sense strand, start) 
of SEQ ID NO: 1 of Table 2, to the position of nucleic acid 
number (sense strand, stop) or a sequence having at least 
70 % homology thereto, or when the reading frame of Table 
2 is r-1, r-2 or r-3, the nucleic acid molecule has a sequence 

25 from the position of nucleic acid number (antisense strand, 
start) of SEQ ID NO: 1087 of Table 2, to the position of 
nucleic acid number (antisense strand, stop) or a sequence 
having at least 70 % homology thereto. 

30 (4 6) A storage medium comprising information of a 
polpeptide or a variant thereof having at least one amino 
acid sequence selected from the group consisting of SEQ ID 
NO: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
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1839-2157^ or a sequence having at least 70 % homology 
thereto, located therein. 

(47) The storage medium according to Item 4 6, wherein the 
5 polypeptide or the variant thereof has at least three 

contiguous amino acid lengths of at least one amino acid 
sequence selected from the group consisting of SEQ ID NO: 
2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157, or a sequence having at least 70 % homology 
10 thereto, located therein. 

(48) The storage medium according to Item 4 6, wherein the 
polypeptide or the variant thereof ahs at least eight 
contiguous amino acid lengths of at least one amino acid 

15 sequence selected from the group consisting of SEQ ID NO: 
2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157, or a sequence having at least 70 % homology 
thereto, located therein. 

20 (4 9) The storage medium according to Item 4 6, wherein the 
polypeptide or the variant thereof has at least three 
contiguous or non-contiguous amino acid lengths of at least 
one amino acid sequence selected from the group consisting 
of SEQIDNO: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 

25 and 1839-2157, or a sequence having at least 70 % homology 
thereto, and having a biological function, located therein. 

(50) The storage medium according to Item 49, wherein the 
biological activity comprises a function set forth in Table 

30 2. 

(51) A biomolecule chip having at least one antibody 
against a polypeptide or a variant thereof, located on a 
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substrate,, the polypeptide or the variant thereof comprises 
at least one amino acid sequence of sequences selected from 
the group consisting of SEQ ID NOs : 2-341^ 343-722^ 724-1086, 
1088-1468, 1470-1837 and 1839-2157, or a sequence having 
5 at least 70 % homology thereto . 

(52) An RNAi molecule having a sequence homologous to a 
reading frame sequence wherein, when the reading frame of 
Table 2 is f-1, f-2 or f-3, the reading frame sequence has 

10 a sequence from the position of nucleic acid number (sense 
strand, start) of SEQ ID NO: 1 of Table 2, to the position 
of nucleic acid number (sense strand, stop) or a sequence 
having at least 70 % homology thereto, or when the reading 
frame of Table 2 is r-1, r-2 or r-3, the reading frame 

15 sequence has a sequence from the position of nucleic acid 
number (antisense strand, start) of SEQ ID NO: 1087 of Table 
2, to the position of nucleic acid number (antisense strand, 
stop) or a sequence having at least 70 % homology thereto- 

20 (53) The RNAi molecule according to Item 52, which is an 
RNA or a variant thereof comprising a double-stranded 
portion of at least 10 nucleotides in length. 

(54) The RNAi molecule according to Item 52, comprising a 
25 3' overhang terminus. 

(55) The RNAi molecule according to Item 54, wherein the 
3' overhang terminus is a DNA of at least 2 nucleotides in 
length. 

30 

(56) The RNAi molecule according to Item 54, wherein the 
3' overhang terminus is a DNA of two to four nucleotides 
in length. 
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The prsent biomolecule chip may be DNA chip^ protein 
chip or the like. 

5 Hereinafter the preferable embodiments of the present 

invention are described. However, it should be appreciated 
that those skilled in the art can readily and appropriately 
carry out such embodiments of the invention from the 
description of the present invention and the well-known 
10 technology and common general knowledge of the art, and 
readily understand the effects and advantages of the present 
invention therefrom. 

BRIEF DESCRIPTION OF THE DRAWINGS 

15 

Figure 1 is a schematic diagram of double-cross over 
disruption. 

Figure 2 is a schematic diagram of linear DNA using double 
20 cross-over disruption. 

Figure 3 is a schematic diagram of single cross-over 
disruption. 

25 Figure 4 is a diagram showing a genome structure of the 
present invention . 

Figure 5 is another diagram showing a genome structure of 
the present invention. 

30 

Figure 6 is another diagram showing a genome structure of 
the present invention. 
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Figure 7 is an exemplary schematic diagram showing a genomic 
biomolecule chip. 

The description of the sequence listings is set forth 
5 in another Table (Table 2) . 

DETAILED DESCRIPTION OF THE INVENTION 

Heterinafter the best modes of the present invention 

10 are described. It should be understood throughout the 
present specification that expression of a singular form 
includes the concept of their plurality unless otherwise 
mentioned. Specifically^ articles for a singular form 
(e.g., "^a", ^^an'% "'the", etc. in English; ''ein'% "Mer", "Mas", 

15 "Mie", etc. and their inflections in German; ""un", ""une", 
""le'\ ""la'\ etc. in French; ""un", ""una", ""el", ""la", etc. 
in Spanish, and articles, adjectives, etc. in other 
languages) include the concept of their plurality unless 
otherwise mentioned. It should be also understood that the 

20 terms as used herein have definitions typically used in the 
art unless otherwise mentioned. Thus, unless otherwise 
defined, all scientific and technical terms have the same 
meanings as those generally used by those skilled in the 
art to which the present invention pertain. If there is 

25 contradiction, the present specification (including the 
definition) precedes . 

The embodiments provided hereinafter are provided for 
better understanding of the present invention, and should 
30 be understood that the the scope of the present invention 
should not be limited to the following description. 
Accordingly, it is apparant that those skilled in the art 
can appropriately modify the present invention within the 
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scope thereof upon reading the description of the present 
specification. 

(Definition of Terms) 
5 The definitions of terms used herein are described 

below. 

As used herein the term "organism" is used in the 
widest sense in the art and refers to a living entity haveing 
10 a genome. An organism comprises prokaryotes (for example^ 
E. coli, hyperthermophillic bacteria and the like) and 
eukaryotes (for example, plants, animals and the like) and 
the like. 

15 As used herein, the term "genome" refers to a group 

of genes of a set of chromosomes which is indispensable for 
supporting living activity of a living organism. In 
monoploidic organisms such as bacteria, phages, viruses and 
the like, one DNA or RNA molecule per se is responsible for 

20 the genetic information defining these species and is 
considered the genome. On the other hand, in diploidic 
organisms such as many eukaryotic organisms, a set of 
chromosomes (for example, a human has 23 pairs of chromosomes, 
a mouse has 20 pairs of chromosomes) in a germ cell, and 

25 two sets of chromosomes in a somatic cell comprise the 
genome . 

As used herein, the term ^'gene" refers to an element 
defining a genetic trait. A gene is typically arranged in 
30 a given sequence on a chromosome. A gene which defines the 
primary structure of a protein is called a structural gene. 
A gene which regulates the expression of a structural gene 
is called a regulatory gene. As used herein, the term ^^gene" 
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may refer to ^^polynucleotide" ^ ^^oligonucleotide'', ^^nucleic 
acid", and ^^nucleic acid molecule" and/or ^^protein", 
^^polypeptide", ^^oligopeptide" and ^"peptide". 

5 The terms ^^protein", ^^polypeptide", ^^oligopeptide" 

and ''''peptide" as used herein have the same meaning and refer 
to an amino acid polymer having any length. This polymer 
may be a straight, branched or cyclic chain. An amino acid 
may be a naturally-occurring or non-naturally-occurring 

10 amino acid, or a variant amino acid. The term may include 
those assembled into a composite or a plurality of 
polypeptide chains. The term also includes a 

naturally-occurring or artificially modified amino acid 
polymer. Such modification includes, for example, 

15 disulfide bond formation, glycosylation, lipidation, 
acetylation, phosphorylation, or any other manipulation or 
modification (e.g., conjugation with a labeling moiety). 
This definition encompasses a polypeptide containing at 
least one amino acid analog (e.g., non-naturally-occurring 

20 amino acid, etc. ) , a peptide-like compound (e.g. , peptoid) , 
and other variants known in the art, for example. Gene 
products comprising a sequence listed in the Sequence 
Listing usually take a polypeptide form. As used herein, 
the polypeptide of the present invention has a specific 

25 sequence (a sequence set forth in Sequence Listings or a 
variant thereof) . A sequence having a variant may be used 
for a varitey of purposes, such as diagnostic use, in the 
present invention . 

30 The terms ^^polynucleotide", ^^oligonucleotide", and 

^^nucleic acid" as used herein have the same meaning and refer 
to a nucleotide polymer having any length. This term also 
includes an ^'oligonucleotide derivative" or a 
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^^polynucleotide derivative". An ^^oligonucleotide 

derivative'' or a ^^polynucleotide derivative" includes a 
nucleotide derivative, or refers to an oligonucleotide or 
a polynucleotide having different linkages between 
5 nucleotides from typical linkages, which are 
interchangeably used. Examples of such an oligonucleotide 
specifically include 2' -0-methyl-ribonucleotide, an 
oligonucleotide derivative in which a phosphodiester bond 
in an oligonucleotide is converted to a phosphor othioate 

10 bond, an oligonucleotide derivative in which a 
phosphodiester bond in an oligonucleotide is converted to 
a N3'-P5' phosphoroamidate bond, an oligonucleotide 
derivative in which a ribose and a phosphodiester bond in 
an oligonucleotide are converted to a peptide-nucleic acid 

15 bond, an oligonucleotide derivative in which uracil in an 
oligonucleotide is substituted with C-5 propynyl uracil, 
an oligonucleotide derivative in which uracil in an 
oligonucleotide is substituted with C-5 thiazole uracil, 
an oligonucleotide derivative in which cytosine in an 

20 oligonucleotide is substituted with 0-5 propynyl cytosine, 
an oligonucleotide derivative in which cytosine in an 
oligonucleotide is substituted with phenoxazine-modif ied 
cytosine, an oligonucleotide derivative in which ribose in 
DNA is substituted with 2'~0-propyl ribose, and an 

25 oligonucleotide derivative in which ribose in an 
oligonucleotide is substituted with 2 ' -methoxyethoxy 
ribose. Unless otherwise indicated, a particular nucleic 
acid sequence also implicitly encompasses 

conservatively-modified variants thereof (e.g. degenerate 

30 codon substitutions) and complementary sequences as well 
as the sequence explicitly indicated. Specifically, 
degenerate codon substitutions may be produced by 
generating sequences in which the third position of one or 
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more selected (or all) codons is substituted with mixed-base 
and/or deoxyinosine residues (Batzer et al.. Nucleic Acid 
Res. 19:5081(1991); Ohtsuka et al., J. Biol. Chem. 
260:2605-2608 (1985); Rossolini et al.^ Mol. Cell. Probes 
5 8:91-98(1994)) . The gene of the present invention usually 
takes this polynucleotide form- 
As used herein^ the term ^^nucleic acid molecule"' is 
used interchangeably with ^^nucleic acid'', 

10 ^^oligonucleotide'', and ^^polynucleotide", including cDNA, 
mRNA, genomic DNA, and the like. As used herein, nucleic 
acid and nucleic acid molecule may be included by the concept 
of the term ^^gene". A nucleic acid molecule encoding the 
sequence of a given gene includes ^^splice mutant (variant) 

15 Similarly, a particular protein encoded by a nucleic acid 
encompasses any protein encoded by a splice variant of that 
nucleic acid. ^^Splice mutants", as the name suggests, are 
products of alternative splicing of a gene. After 
transcription, an initial nucleic acid transcript may be 

20 spliced such that different (alternative) nucleic acid 
splice products encode different polypeptides. Mechanisms 
for the production of splice variants vary, but include 
alternative splicing of exons. Alternative polypeptides 
derived from the same nucleic acid by read-through 

25 transcription are also encompassed by this definition. Any 
products of a splicing reaction, including recombinant 
forms of the splice products, are included in this definition. 
Such variants are useful for a variety of assays. 

30 As used herein, the term ^^amino acid" may refer to a 

naturally-occurring or non-naturally-occurring amino acid 
as long as the object of the present invention is satisfied. 
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As used herein, the term ^^amino acid derivative" or 
^^amino acid analog" refers to an amino acid which is 
different from a naturally-occurring amino acid and has a 
function similar to that of the original amino acid. Such 
5 amino acid derivatives and amino acid analogs are well known 
in the art. 

The term ^^naturally-occurring amino acid" refers to 
an L-isomer of a naturally-occurring amino acid. The 

10 naturally-occurring amino acids are glycine, alanine, 
valine, leucine, isoleucine, serine, methionine, threonine, 
phenylalanine, tyrosine, tryptophan, cysteine, proline, 
histidine, aspartic acid, asparagine, glutamic acid, 
glutamine, y-carboxyglutamic acid, arginine, ornithine, and 

15 lysine. Unless otherwise indicated, all amino acids as used 
herein are L-isomers. An embodiment using a D-isomer of an 
amino acid falls within the scope of the present invention. 

The term ^^non-naturally-occurring amino acid" refers 
20 to an amino acid which is ordinarily not found in nature. 
Examples of non-naturally-occurring amino acids include 
D-forms of an amino acid as described above, norleucine, 
para-nitrophenylalanine, homophenylalanine, 
para-f luorophenylalanine, 3-amino-2-benzyl propionic acid, 
25 D- or L-homoarginine, and D-phenylalanine . 

As used herein, the term ''^amino acid analog" refers 
to a molecule having a physical property and/or function 
similar to that of amino acids, but is not an amino acid. 
30 Examples of amino acid analogs include, for example, 
ethionine, canavanine, 2-methylglutamine, and the like. An 
amino acid mimic refers to a compound which has a structure 
different from that of the general chemical structure of 
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amino acids but which functions in a manner similar to that 
of naturally-occurring amino acids. 

As used herein, the term ^^nucleotide'' may be either 
5 naturally-occurring or non-naturally-occurring. The term 
^'nucleotide derivative" or ^'nucleotide analog" refers to 
a nucleotide which is different from naturally-occurring 
nucleotides and has a function similar to that of the 
original nucleotide. Such nucleotide derivatives and 
10 nucleotide analogs are well known in the art. Examples of 
such nucleotide derivatives and nucleotide analogs include, 
but are not limited to, phosphorothioate, phosphor amidate, 
methylphosphonate, chiral-methylphosphonate, 2-O-methyl 
ribonucleotide, and peptide-nucleic acid (PNA) . 

15 

Amino acids may be referred to herein by either their 
commonly known three letter symbols or by the one-letter 
symbols recommended by the lUPAC-IUB Biochemical 
Nomenclature Commission. Nucleotides, likewise, may be 
20 referred to by their commonly accepted single-letter codes. 

As used herein, the term ^'corresponding" amino acid 
or nucleic acid refers to an amino acid or nucleotide in 
a given polypeptide or polynucleotide molecule, which has, 

25 or is anticipated to have, a function similar to that of 
a predetermined amino acid or nucleotide in a polypeptide 
or polynucleotide as a reference for comparison. 
Particularly, in the case of enzyme molecules, the term 
refers to an amino acid which is present at a similar position 

30 in an active site and similarly contributes to catalytic 
activity. For example, in the case of an antisense molecule, 
a corresponding antisense molecule may be a similar portion 
in an ortholog corresponding to a particular portion of the 
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antisense molecule. 

As used herein, the term ^'corresponding" gene (e.g., 
a polypeptide or polynucleotide molecule) refers to a gene 
5 in a given species, which has, or is expected to have, a 
function similar to that of a predetermined gene in a species 
as a reference for comparison. When there are a plurality 
of genes having such a function, the term refers to a gene 
having the same evolutionary origin. Therefore, a gene 

10 corresponding to a given gene may be an ortholog of the given 
gene. Thus, a gene corresponding to each gene can be found 
in other organisms. Such a corresponding gene can be 
identified by techniques well known in the art. For example, 
a corresponding gene in a given organism can be found by 

15 searching a sequence database of the organism (e.g., 
hyperthermophillic bacteria) using the sequence of a 
reference gene (e.g., gene comprising a sequence set forth 
in Sequence Listing etc.) as a query sequence. 

20 As used herein, the term ^'fragment'' with respect to 

a polypeptide or polynucleotide refers to a polypeptide or 
polynucleotide having a sequence length ranging from 1 to 
n-1 with respect to the full length of the reference 
polypeptide or polynucleotide (of length n) . The length of 

25 the fragment can be appropriately changed depending on the 
purpose. For example, in the case of polypeptides, the 
lower limit of the length of the fragment includes 3, 4, 

5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50 or more nucleotides . 
Lengths represented by integers which are not herein 

30 specified (e.g., 11 and the like) may be appropriate as a 
lower limit. For example, in the case of polynucleotides, 
the lower limit of the length of the fragment includes 5, 

6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 100 or more 
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nucleotides. Lengths represented by integers which are not 
herein specified (e.g., 11 and the like) may be appropriate 
as a lower limit. As used herein, the length of polypeptides 
or polynucleotides can be represented by the number of amino 
5 acids or nucleic acids, respectively. However, the 
above-described numbers are not absolute. The 
above-described numbers, as the upper or lower limit, are 
intended to include some greater or smaller numbers (e.g., 
±10%) , as long as the same function is maintained. For this 
10 purpose, about" may be herein put ahead of the numbers. 
However, it should be understood that the interpretation 
of numbers is not affected by the presence or absence of 
^^about" in the present specification. 

15 As used herein, the term ^^agent specifically 

interacting with" a biological agent, or "specific agent", 
such as a polynucleotide, a polypeptide or the like, are 
used interchangeably and refer to an agent which has an 
affinity for the biological agent, such as a polynucleotide, 

20 a polypeptide or the like, which is representatively higher 
than or equal to the affinity for other non-related 
biological agents, such as polynucleotides, polypeptides 
or the like (particularly, those with identity of less than 
30%; in a specific embodiment, less than 99 % identity), 

25 and preferably significantly (e.g., statistically 
significantly) higher. Such affinity may be measured by 
hybridizatin assay, binding assay and the like. When a 
biologial agent is a polypeptide, a specific agent to the 
polypeptide includes a specific antibody, and it should be 

30 understood that in a particular embodiment, the specific 
agents of the present invention may include an agent specific 
to the specific antibodies. It should be understood that 
such specific agents to the specific andibodies include the 
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polypeptide of interest per se. 

As used herein, the ^^agent" may be any substance or 
other agent (e.g., energy) as long as the intended purpose 
5 can be achieved- Examples of such a substance include, but 
are not limited to, proteins, polypeptides, oligopeptides, 
peptides, polynucleotides, oligonucleotides, nucleotides, 
nucleic acids (e.g., DNA such as cDNA , genomic DNA , or 
the like, and RNA such as mRNA) , polysaccharides, 

10 oligosaccharides, lipids, low molecular weight organic 
molecules (e.g., hormones, ligands, information transfer 
substances, molecules synthesized by combinatorial 
chemistry, low molecular weight molecules , and the like 
(e.g., pharmaceutically acceptable low molecular weight 

15 ligands and the like) ) , and combinations of these molecules . 
Examples of an agent specific to a polynucleotide include, 
but are not limited to, a polynucleotide having 
complementarity to the sequence of the polynucleotide with 
a predetermined sequence homology (e.g., 70% or more 

20 sequence identity) , a polypeptide such as a transcriptional 
agent binding to a promoter region, and the like. Examples 
of an agent specific to a polypeptide include, but are not 
limited to, an antibody specifically directed to the 
polypeptide or derivatives or analogs thereof (e.g. , single 

25 chain antibody) , a specific ligand or receptor when the 
polypeptide is a receptor or ligand, a substrate when the 
polypeptide is an enzyme, and the like. 

As used herein, the term ^^low molecular weight organic 
30 molecule" refers to an organic molecule having a relatively 
small molecular weight. Usually, the low molecular weight 
organic molecule refers to a molecular weight of about 1,000 
or less, or may refer to a molecular weight of more than 
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1,000. Low molecular weight organic molecules can be 
ordinarily synthesized by methods known in the art or 
combinations thereof. These low molecular weight organic 
molecules may be produced by organisms. Examples of the low 
5 molecular weight organic molecule include, but are not 
limited to, hormones, ligands, information transfer 
substances, molecules synthesized by combinatorial 
chemistry, pharmaceutically acceptable low molecular 
weight molecules (e.g., low molecular weight ligands and 
10 the like), and the like. 

As used herein, the term ^''antibody" encompasses 
polyclonal antibodies, monoclonal antibodies, human 
antibodies, humanized antibodies, polyf unctional 
antibodies, chimeric antibodies, and anti-idiotype 
antibodies, and fragments thereof (e.g., F(ab')2 and Fab 
fragments), and other recombinant conjugates. These 
antibodies may be fused with an enzyme (e.g., alkaline 
phosphatase, horseradish peroxidase, a-galactosidase, and 
the like) via a covalent bond or by recombination. 

As used herein, the term ^""monoclonal antibody" refers 
to an antibody composition having a group of homologous 
antibodies. This term is not limited by the production 
25 manner thereof. This term encompasses all immunoglobulin 
molecules and Fab molecules, F(ab')2 fragments, Fv 
fragments, and other molecules having an immunological 
binding property of the original monoclonal antibody 
molecule. Methods for producing polyclonal antibodies and 
30 monoclonal antibodies are well known in the art, and will 
be more sufficiently described below. 

Monoclonal antibodies are prepared by using a standard 
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technique well known in the art (e.g. , Kohler and Milstein^ 
Nature, 1975, 256:495) or a modification thereof (e.g.. Buck 
etal.. In Vitro, 18, 1982:377). Representatively, a mouse 
or rat is immunized with a protein bound to a protein carrier, 
5 and boosted. Subsequently, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into 
single cells. If desired, the spleen cells may be screened 
(after removal of nonspecif ically adherent cells) by 
applying a cell suspension to a plate or well coated with 

10 a protein antigen. B-cells that express membrane-bound 
immunoglobulin specific for the antigen bind to the plate, 
and are not rinsed away with the rest of the suspension. 
Resulting B-cells, or all dissociated spleen cells, are then 
induced to fuse with myeloma cells to form hybridomas . The 

15 hybridomas are used to produce monoclonal antibodies. 

As used herein, the term ^^antigen'' refers to any 
substrate to which an antibody molecule may specifically 
bind. As used herein, the term ^^immunogen" refers to an 
20 antigen initiating activation of the antigen-specific 
immune response of a lymphocyte. 

As used herein, the term ^^single chain antibody"' 
refers to a single chain polypeptide formed by linking a 
25 heavy chain fragment and the light chain fragment of the 
Fv region via a peptide crosslinker. 

As used herein, the term ""^composite molecule'' refers 
to a molecule in which a plurality of molecules, such as 
30 polypeptides, polynucleotides, lipids, sugars, small 
molecules, or the like, are linked together. Examples of 
a composite molecule include, but are not limited to, 
glycolipids, glycopeptides, and the like. Such composite 
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molecules can be herein used as a DICSl gene or a product 
thereof, or an agent of the present invention, as long as 
they have a similar function to that of the gene or the 
product thereof, or the agent of the present invention. 

5 

As used herein, the term ^^isolated" biological agent 
(e.g., nucleic acid, protein, or the like) refers to a 
biological agent that is substantially separated or 
purified from other biological agents in cells of a 

10 naturally-occurring organism (e.g., in the case of nucleic 
acids, agents other than nucleic acids and a nucleic acid 
having nucleic acid sequences other than an intended nucleic 
acid; and in the case of proteins, agents other than proteins 
and proteins having an amino acid sequence other than an 

15 intended protein) . The ^'isolated" nucleic acids and 
proteins include nucleic acids and proteins purified by a 
standard purification method. The isolated nucleic acids 
and proteins also include chemically synthesized nucleic 
acids and proteins - 

20 

As used herein, the term ^^purif ied'' biological agent 
(e.g., nucleic acids, proteins, and the like) refers to one 
from which at least a part of the naturally accompanying 
agents are removed. Therefore, ordinarily, the purity of 
25 a purified biological agent is higher than that of the 
biological agent in a normal state (i.e., concentrated). 

As used herein, the terms ""purified" and "'isolated'' 
mean that the same type of biological agent is present 
30 preferably at least 75% by weight, more preferably at least 
85% by weight, even more preferably at least 95% by weight, 
and most preferably at least 98% by weight. 
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As used herein, the term '^expression" of a gene, a 
polynucleotide, a polypeptide, or the like, indicates that 
the gene or the like is affected by a predetermined action 
in vivo to be changed into another form. Preferably, the 
5 term ""expression" indicates that genes, polynucleotides, 
or the like are transcribed and translated into polypeptides. 
In one embodiment of the present invention, genes may be 
transcribed into mRNA. More preferably, these polypeptides 
may have post-translational processing modifications. 

10 

Therefore, as used herein^ the term ""reduction" of 
""expression" of a gene, a polynucleotide, a polypeptide, 
or the like indicates that the level of expression is 
significantly reduced in the presence of or under the action 

15 of the agent of the present invention as compared to when 
the action of the agent is absent. Preferably, the 
reduction of expression includes a reduction in the amount 
of expression of a polypeptide- As used herein, the term 
""increase" of ""expression" of a gene, a polynucleotide, a 

20 polypeptide, or the like indicates that the level of 
expression is significantly increased in the presence of 
the action of the agent of the present invention as compared 
to when the action of the agent is absent. Preferably, the 
increase of expression includes an increase in the amount 

25 of expression of a polypeptide. As used herein, the term 
""induction" of ""expression" of a gene indicates that the 
amount of expression of the gene is increased by applying 
a given agent to a given cell. Therefore, the induction of 
expression includes allowing a gene to be expressed when 

30 expression of the gene is not otherwise observed, and 
increasing the amount of expression of the gene when 
expression of the gene is observed. 
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As used herein, the term ^^specifically expressed" in 
relation to a gene indicates that the gene is expressed in 
a specific site or for a specific period of time, at a level 
different from (preferably higher than) that in other sites 
5 or for other periods of time. The term ^^specifically 
expressed'' indicates that a gene may be expressed only in 
a given site (specific site) or may be expressed in other 
sites. Preferably, the term ^^specifically expressed" 
indicates that a gene is expressed only in a given site. 

10 

As used herein, the term ^^biological activity" refers 
to activity possessed by an agent (e.g., a polynucleotide, 
a protein, etc.) within an organism, including activities 
exhibiting various functions (e.g., transcription 

15 promoting activity, etc.). For example, when two agents 
interact with each other (the gene product of the present 
invention and the receptor therefor) , the biological 
activity thereof includes the binding of the gene product 
of the present invention and the receptor therefor and a 

20 biological change (e.g., apoptosis) caused thereby. In 
another example, when a certain factor is an enzyme, the 
biological activity thereof includes its enzyme activity. 
In still another example, when a certain factor is a ligand, 
the biological activity thereof includes the binding of the 

25 ligand to a receptor corresponding thereto. The 
above-described biological activity can be measured by 
techniques well-known in the art. Alternatively, in the 
present invention, the cases of a modified molecule having 
similar activity in the living organism may be included in 

30 the definition of having biological activity. 

As used herein, the term ^'antisense (activity) " refers 
to activity which permits specific suppression or reduction 
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of expression of a target gene. The antisense activity is 
ordinarily achieved by a nucleic acid sequence having a 
length of at least 8 contiguous nucleotides, which is 
complementary to the nucleic acid sequence of a target gene 
5 (e.g./ genes of the present invention, etc.). A molecule 
having such antisense activity is called an antisense 
molecule. Such a nucleic acid sequence preferably has a 
length of at least 9 contiguous nucleotides, more preferably 
a length of at least 10 contiguous nucleotides, and even 

10 more preferably a length of at least 11 contiguous 
nucleotides, a length of at least 12 contiguous nucleotides, 
a length of at least 13 contiguous nucleotides, a length 
of at least 14 contiguous nucleotides, a length of at least 
15 contiguous nucleotides, a length of at least 20 contiguous 

15 nucleotides, a length of at least 30 contiguous nucleotides, 
a length of at least 40 contiguous nucleotides, and a length 
of at least 50 contiguous nucleotides. These nucleic acid 
sequences include nucleic acid sequences having at least 
70% homology thereto, more preferably at least 80%, even 

20 more preferably at least 90%, and still even more preferably 
at least 95%. The antisense activity is preferably 
complementary to a 5' terminal sequence of the nucleic acid 
sequence of a target gene. Such an antisense nucleic acid 
sequence includes the above-described sequences having one 

25 or several, or at least one, nucleotide substitutions, 
additions, and/or deletions. 

As used herein, the term ^^RNAi" is an abbreviation of 
RNA interference and refers to a phenomenon where an agent 
30 for causing RNAi, such as double-stranded RNA (also called 
dsRNA) , is introduced into cells and mRNA homologous thereto 
is specifically degraded, so that synthesis of gene products 
is suppressed, and also referes to a technique using the 
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phenomenon. As used herein, RNAi may have the same meaning 
as that of an agent which causes RNAi. 

As used herein, the term ^^an agent causing RNAi" refers 
5 to any agent causing RNAi. As used herein, ^^an agent causing 
RNAi for a gene" indicates that the agent causes RNAi 
relating to the gene and the effect of RNAi is achieved (e.g., 
suppression of expression of the gene, and the like) . 
Examples of such an agent causing RNAi include, but are not 

10 limited to, a sequence having at least about 70% homology 
to the nucleic acid sequence of a target gene or a sequence 
hybridizable under stringent conditions, RNA containing a 
double-stranded portion having a length of at least 10 
nucleotides or variants thereof. Herein, this agent may be 

15 preferably DNA containing a 3' protruding end, and more 
preferably the 3' protruding end has a length of 2 or more 
nucleotides (e.g., 2-4 nucleotides in length). 

Though not wishing to be bound by any theory, a 
20 mechanism which causes RNAi is considered as follows. When 
a molecule which causes RNAi, such as dsRNA, is introduced 
into a cell, an RNase Ill-like nuclease having a helicase 
domain (called dicer) cleaves the molecule on about a 20 
base pair basis from the 3' terminus in the presence of ATP 
25 in the case where the RNA is relatively long (e.g., 40 or 
more base pairs) . As used herein, the term ^^siRNA" is an 
abbreviation of short interfering RNA and refers to short 
double-stranded RNA of 10 or more base pairs which are 
artificially chemically or biochemically synthesized, 
30 synthesized in the organism body, or produced by 
double-stranded RNA of about 40 or more base pairs being 
degraded within the body. siRNA typically has a structure 
having 5' -phosphate and 3' -OH, where the 3' terminus 



32 KJ002 



projects by about 2 bases. A specific protein is bound to 
siRNA to form RISC (RNA-induced-silencing-complex) . This 
complex recognizes and binds to mRNA having the same sequence 
as that of siRNA and cleaves mRNA at the middle of siRNA 
5 due to RNase Ill-like enzymatic activity. It is preferable 
that the relationship between the sequence of siRNA and the 
sequence of mRNA to be cleaved as a target is a 100% match. 
However, base mutation at a site away from the middle of 
siRNA does not completely remove the cleavage activity by 

10 RNAi, leaving partial activity, while base mutation in the 
middle of siRNA has a large influence and the mRNA cleavage 
activity by RNAi is considerably lowered. By utilizing this 
nature, mRNA having a mutation can be specifically degraded . 
Specifically, siRNA in which the mutation is provided in 

15 the middle thereof is synthesized and is introduced into 
a cell. Therefore, in the present invention, siRNA per se 
as well as an agent capable of producing siRNA (e.g., 
representatively dsRNA of about 4 0 or more base pairs) can 
be used as an agent capable of eliciting RNAi. 

20 

Also, though not wishing to be bound by any theory, 
apart from the above-described pathway, the antisense 
strand of siRNA binds to mRNA and siRNA functions as a primer 
for RNA-dependent RNA polymerase (RdRP) , so that dsRNA is 

25 synthesized. This dsRNA is a substrate for a dicer again, 
leading to production of new siRNA. It is intended that such 
an action is amplified. Therefore, in the present invention, 
siRNA per se as well as an agent capable of producing siRNA, 
are useful. In fact, in insects and the like, for example, 

30 35 dsRNA molecules can substantially completely degrade 
1000 or more copies of intracellular mRNA, and therefore, 
it will be understood that siRNA per se, as well as an agent 
capable of producing siRNA, is useful. 
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In the present invention, double-stranded RNA having 
a' length of about 20 bases (e.g., representatively about 
21 to 23 bases) or less than about 20 bases, which is called 
5 siRNA, can be used. Expression of siRNA in cells can 
suppress expression of a pathogenic gene targeted by the 
siRNA. Therefore, siRNA can be used for treatment of 
diseases as a prophylaxis, prognosis, and the like. 

10 The siRNA of the present invention may be in any form 

as long as it can elicit RNAi. 

In another embodiment, an agent capable of causing 
RNAi may have a short hairpin structure having a sticky 

15 portion at the 3' terminus (shRNA; short hairpin RNA) . As 
used herein, the term ^^shRNA'' refers to a molecule of about 
20 or more base pairs in which a single-standed RNA partially 
contains a palindromic base sequence and forms a 
double-strand structure therein (i.e., a hairpin structure) . 

20 shRNA can be artificially synthesized chemically. 
Alternatively, shRNA can be produced by linking sense and 
antisense strands of a DNA sequence in reverse directions 
and synthesizing RNA In vitro with T7 RNA polymerase using 
the DNA as a template. Though not wishing to be bound by 

25 any theory, it should be understood that after shRNA is 
introduced into a cell, the shRNA is degraded in the cell 
into a length of about 20 bases (e.g., representatively 21, 
22, 23 bases) , and causes RNAi as with siRNA, leading to 
the treatment effect of the present invention. It should 

30 be understood that such an effect is exhibited in a wide 
range of organisms, such as insects, plants, animals 
(including mammals) , and the like. Thus, shRNA elicits RNAi 
as with siRNA and therefore can be used as an effective 
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component of the present invention. shRNA may preferably 
have a 3' protruding end. The length of the double-stranded 
portion is not particularly limited^ but is preferably about 
10 or more nucleotides, and more preferably about 20 or more 
5 nucleotides. Here, the 3' protruding end may be preferably 
DNA, more preferably DNA of at least 2 nucleotides in length, 
and even more preferably DNA of 2-4 nucleotides in length. 

An agent capable of causing RNAi used in the present 
10 invention may be artificially synthesized (chemically or 
biochemically) or naturally occurring. There is 

substantially no difference therebetween in terms of the 
effect of the present invention. A chemically synthesized 
agent is preferably purified by liquid chromatography or 
15 the like. 

An agent capable of causing RNAi used in the present 
invention can be produced in vitro. In this synthesis 
system, T7 RNA polymerase and T7 promoter are used to 

20 synthesize antisense and sense RNAs from template DNA. 
These RNAs are annealed and thereafter are introduced into 
a cell. In this case, RNAi is caused via the above-described 
mechanism, thereby achieving the effect of the present 
invention. Here, for example, the introduction of RNA into 

25 cell can be carried out by a calcium phosphate method. 

Another example of an agent capable of causing RNAi 
according to the present invention is a single-stranded 
nucleic acid hybridizable to mRNA or all nucleic acid analogs 
30 thereof. Such agents are useful for the method and 
composition of the present invention. 

As used herein, ^^polynucleotides hybridizing under 
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stringent conditions" refers to conditions coininonly used 
and well known in the art. Such a polynucleotide can be 
obtained by conducting colony hybridization, plaque 
hybridization. Southern blot hybridization, or the like 
5 using a polynucleotide selected from the polynucleotides 
of the present invention. Specifically, a filter on which 
DNA derived from a colony or plaque is immobilized is used 
to conduct hybridization at 65°C in the presence of 0.7 to 
1.0 M NaCl. Thereafter, a 0.1 to 2-fold concentration SSC 

10 (saline-sodium citrate) solution (1-fold concentration SSC 
solution is composed of 150 mM sodium chloride and 15 mM 
sodium citrate) is used to wash the filter at 65*^C. 
Polynucleotides identified by this method are referred to 
as ^^polynucleotides hybridizing under stringent 

15 conditions''. Hybridization can be conducted in accordance 
with a method described in, for example. Molecular Cloning 
2nd ed. , Current Protocols in Molecular Biology, Supplement 
1-38, DNA Cloning 1: Core Techniques, A Practical Approach, 
Second Edition, Oxford University Press (1995), and the like. 

20 Here, sequences hybridizing under stringent conditions 
exclude, preferably, sequences containing only A or T. 
''^Hybridizable polynucleotide" refers to a polynucleotide 
which can hybridize other polynucleotides under the 
above-described hybridization conditions . Specifically, 

25 the hybridizable polynucleotide includes at least a 
polynucleotide having a homology of at least 60% to the base 
sequence of DNA encoding a polypeptide having an amino acid 
sequence specifically herein disclosed, preferably a 
polynucleotide having a homology of at least 80%, and more 

30 preferably a polynucleotide having a homology of at least 
95%. 

The term ^^highly stringent conditions" refers to those 
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conditions that are designed to permit hybridization of DNA 
strands whose sequences are highly complementary^ and to 
exclude hybridization of significantly mismatched DNAs. 
Hybridization stringency is principally determined by 
5 temperature, ionic strength, and the concentration of 
denaturing agents such as formamide. Examples of ^^highly 
stringent conditions" for hybridization and washing are 
0.0015 M sodium chloride, 0.0015 M sodium citrate at 65-68°C 
or 0.015 M sodium chloride, 0.0015 M sodium citrate, and 

10 50% formamide at 42°C. See Sambrook, Fritsch & Maniatis, 
Molecular Cloning: A Laboratory Manual (2nded-, Cold Spring 
Harbor Laboratory, N.Y., 1989); Anderson et al.. Nucleic 
Acid Hybridization: A Practical Approach Ch. 4 (IRL Press 
Limited) (Oxford Express). More stringent conditions 

15 (such as higher temperature, lower ionic strength, higher 
formamide, or other denaturing agents) may be optionally 
used. Other agents may be included in the hybridization and 
washing buffers for the purpose of reducing non-specific 
and/or background hybridization. Examples are 0.1% bovine 

20 serum albumin, 0.1% polyvinylpyrrolidone, 0.1% sodium 
pyrophosphate, 0.1% sodium dodecylsulf ate (NaDodS04 or SDS) , 
Ficoll, Denhardt's solution, sonicated salmon sperm DNA (or 
another noncomplementary DNA) , and dextran sulfate, 
although other suitable agents can also be used. The 

25 concentration and types of these additives can be changed 
without substantially affecting the stringency of the 
hybridization conditions. Hybridization experiments are 
ordinarily carried out at pH 6.8-7.4; however, at typical 
ionic strength conditions, the rate of hybridization is 

30 nearly independent of pH. See Anderson et al. , Nucleic Acid 
Hybridization: A Practical Approach Ch. 4 (IRL Press Limited, 
Oxford UK) . 
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Factors affecting the stability of DNA duplex include 
base composition^ length, and degree of base pair mismatch. 
Hybridization conditions can be adjusted by those skilled 
in the art in order to accommodate these variables and allow 
5 DNAs of different sequence relatedness to form hybrids. The 
melting temperature of a perfectly matched DNA duplex can 
be estimated by the following equation: 

Tm CO = 81.5 + 16.6 (log[Na^]) + 0.41 (% G+C) - 600/N 
10 - 0.72 (% formamide) 

where N is the length of the duplex formed, [Na^^] is the molar 
concentration of the sodium ion in the hybridization or 
washing solution, % G+C is the percentage of 
15 (guanine+cytosine) bases in the hybrid. For imperfectly 
matched hybrids, the melting temperature is reduced by 
approximately 1°C for each 1% mismatch. 

The term ^^moderately stringent conditions" refers to 
20 conditions under which a DNA duplex with a greater degree 
of base pair mismatching than could occur under ^^highly 
stringent conditions" is able to form. Examples of typical 
^^moderately stringent conditions" are 0.015 M sodium 
chloride, 0.0015 M sodium citrate at 50-65''C or 0.015 M 
25 sodium chloride, 0.0015 M sodium citrate, and 20% formamide 
at 37-50°C. By way of example, ^^moderately stringent 
conditions" of 50*^0 in 0.015 M sodium ion will allow about 
a 21% mismatch. 

30 It will be appreciated by those skilled in the art that 

there may be no absolute distinction between ^'highly 
stringent conditions" and ^^moderately stringent 
conditions". For example, at 0.015 M sodium ion (no 
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f ormamide ) , the melting temperature of perfectly matched 
long DNA is about 71**C- With a wash at 65*^0 (at the same 
ionic strength) , this would allow for approximately a 6% 
mismatch. To capture more distantly related sequences^ 
5 those skilled in the art can simply lower the temperature 
or raise the ionic strength. 

A good estimate of the melting temperature in 1 M NaCl 
for oligonucleotide probes up to about 20 nucleotides is 
10 given by: 

Tm = (2''C per A-T base pair) + (4°C per G-C base pair) . 

Note that the sodium ion concentration in 6X salt sodium 
15 citrate (SSC) is 1 M. See Suggs et al.. Developmental 
Biology Using Purified Genes 683 (Brown and Fox, eds., 1981). 

A naturally-occurring nucleic acid encoding a protein 
(e.g., Pep5, p75, Rho GDI, MAG, p21, Rho, Rho kinase, or 

20 variants or fragments thereof, or the like) may be readily 
isolated from a cDNA library having PGR primers and 
hybridization probes containing part of a nucleic acid 
sequence indicated in the sequence listing. A preferable 
nucleic acid, or variants or fragments thereof, or the like 

25 is hybridizable to the whole or part of a sequence as set 
forth in SEQ ID NO: 1 or 1087 under low stringent conditions 
defined by hybridization buffer essentially containing 1% 
bovine serum alubumin (BSA) ; 500 mM sodium phosphate 
(NaP04); ImM EDTA; and 7% SDS at 42''C, and wash buffer 

30 essentially containing 2xSSC (600 mM NaCl; 60 mM sodium 
citrate); and 0.1% SDS at 50*=*C, more preferably under low 
stringent conditions defined by hybridization buffer 
essentially containing 1% bovine serum alubumin (BSA) ; 
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500 mM sodium phosphate (NaP04) ; 15% f ormamide; 1 mM EDTA; 
and 7% SDS at 50°C, and wash buffer essentially containing 
IxSSC (300 mM NaCl; 30 mM sodium citrate); and 1% SDS at 
50*^0, and most preferably under low stringent conditions 
5 defined by hybridization buffer essentially containing 1% 
bovine serum alubumin (BSA) ; 200 mM sodium phosphate 
(NaP04) ; 15% formamide; 1 mM EDTA; and 7% SDS at 50°C^ and 
wash buffer essentially containing O.SxSSC (150 mM NaCl; 
15 mM sodium citrate); and 0.1% SDS at 65*'C. 

10 

As used herein, the term ^^probe" refers to a substance 
for use in searching, which is used in a biological 
experiment, such as in vitro and/or in vivo screening or 
the like, including, but not being limited to, for example, 
15 a nucleic acid molecule having a specific base sequence or 
a peptide containing a specific amino acid sequence. 

Examples of a nucleic acid molecule as a usual probe 
include one having a nucleic acid sequence having a length 

20 of at least 8 contiguous nucleotides, which is homologous 
or complementary to the nucleic acid sequence of a gene of 
interest. Such a nucleic acid sequence may be preferably 
a nucleic acid sequence having a length of at least 9 
contiguous nucleotides, more preferably a length of at least 

25 10 contiguous nucleotides, and even more preferably a length 
of at least 11 contiguous nucleotides, a length of 12 
contiguous nucleotides, a length of at least 13 contiguous 
nucleotides, a length of at least 14 contiguous nucleotides, 
a length of at least 15 contiguous nucleotides, a length 

30 of at least 20 contiguous nucleotides, a length of at least 
25 contiguous nucleotides, a length of 30 contiguous 
nucleotides, a length of at least 40 contiguous nucleotides, 
or a length of at least 50 contiguous nucleotides. A nucleic 
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acid sequence used as a probe includes a nucleic acid 
sequence having at least 70% homology to the above-described 
sequence, more preferably at least 80%, and even more 
preferably at least 90%, or at least 95%. 

5 

As used herein, the term ^^search'' indicates that a 
given nucleic acid base sequence is utilized to find other 
nucleic acid base sequences having a specific function 
and/or property electronically or biologically, or other 

10 methods. Examples of electronic search include, but are not 
limited to, BLAST (Altschul et al . , J. Mol. Biol. 215:403-410 
(1990)), FASTA (Pearson & Lipman, Proc. Natl. Acad. Sci., 
USA 85:2444-2448 (1988)), Smith and Waterman method (Smith 
and Waterman, J. Mol. Biol. 147:195-197 (1981)), and 

15 Needleman and Wunsch method (Needleman and Wunsch, J. Mol. 
Biol. 48:443-453 (1970)), and the like. Examples of 
biological search include, but are not limited to, a 
macroarray in which genomic DNA is attached to a nylon 
membrane or the like or a microarray (microassay) in which 

20 genomic DNA is attached to a glass plate under stringent 
hybridization, PGR and in situ hybridization, and the like. 
It is herein intended that the genes used in the present 
invention include corresponding genes identified by such 
an electronic or biological search. 

25 

As used herein, the term ^^primer'' refers to a substance 
required for initiation of a reaction of a macromolecule 
compound to be synthesized in a macromolecule synthesis 
enzymatic reaction. In a reaction for synthesizing a 
30 nucleic acid molecule, a nucleic acid molecule (e.g., DNA, 
RNA, or the like) which is complementary to part of a 
macromolecule compound to be synthesized may be used. 
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A nucleic acid molecule which is ordinarily used as 
a primer includes one that has a nucleic acid sequence having 
a length of at least 8 contiguous nucleotides^ which is 
complementary to the nucleic acid sequence of a gene of 
5 interest. Such a nucleic acid sequence preferably has a 
length of at least 9 contiguous nucleotides, more preferably 
a length of at least 10 contiguous nucleotides, even more 
preferably a length of at least 11 contiguous nucleotides, 
a length of at least 12 contiguous nucleotides, a length 

10 of at least 13 contiguous nucleotides, a length of at least 
14 contiguous nucleotides, a length of at least 15 contiguous 
nucleotides, a length of at least 16 contiguous nucleotides, 
a length of at least 17 contiguous nucleotides, a length 
of at least 18 contiguous nucleotides, a length of at least 

15 19 contiguous nucleotides, a length of at least 20 contiguous 
nucleotides, a length of at least 25 contiguous nucleotides, 
a length of at least 30 contiguous nucleotides, a length 
of at least 4 0 contiguous nucleotides, and a length of at 
least 50 contiguous nucleotides. A nucleic acid sequence 

20 used as a primer includes a nucleic acid sequence having 
at least 7 0% homology to the above-described sequence, more 
preferably at least 80%, even more preferably at least 90%, 
and at least 95%. An appropriate sequence as a primer may 
vary depending on the property of a sequence to be 

25 synthesized (amplified) . Those skilled in the art can 
design an appropriate primer depending on a sequence of 
interest. Such a primer design is well known in the art and 
may be performed manually or using a computer program (e.g. , 
LASERGENE, Primer Select, DNAStar) . 

30 

As used herein, the term ^'epitope" refers to a basic 
structure constituting an antigenic determinant. 
Therefore, the term epitope" includes a set of amino acid 
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residues which is involved in recognition by a particular 
immunoglobulin, or in the context of T cells, those residues 
necessary for recognition by T cell receptor proteins and/or 
Major Histocompatibility Complex (MHC) receptors. This 
5 term is also used interchangeably with ^^antigenic 
determinant'' or ^'antigenic determinant site". In the field 
of immunology, in vivo or In vitro, an epitope is the features 
of a molecule (e.g., primary, secondary and tertiary peptide 
structure, and charge) that form a site recognized by an 

10 immunoglobulin, T cell receptor or HLA molecule. An epitope 
including a peptide comprises 3 or more amino acids in a 
spatial conformation which is unique to the epitope. 
Generally, an epitope consists of at least 5 such amino acids, 
and more ordinarily, consists of at least 6, 7, 8, 9 or 10 

15 such amino acids. The greater the length of an epitope, the 
more the similarity of the epitope to the original peptide, 
i.e., longer epitopes are generally preferable . This is not 
necessarily the case when the conformation is taken into 
account. Methods of determining the spatial conformation 

20 of amino acids are known in the art, and include, for example. 
X-ray crystallography and 2-dimensional nuclear magnetic 
resonance spectroscopy. Furthermore, the identification 
of epitopes in a given protein is readily accomplished using 
techniques well known in the art. See, also, Geysen et al., 

25 Proc. Natl. Acad. Sci. USA (1984) 81: 3998 (general method 
of rapidly synthesizing peptides to determine the location 
of immunogenic epitopes in a given antigen); U. S. Patent 
No. 4,708,871 (procedures for identifying and chemically 
synthesizing epitopes of antigens); and Geysen et al., 

30 Molecular Immunology (1986) 23: 709 (technique for 
identifying peptides with high affinity for a given 
antibody) . Antibodies that recognize the same epitope can 
be identified in a simple immunoassay. Thus, methods for 
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determining epitopes including a peptide are well known in 
the art. Such an epitope can be determined using a 
well-known, common technique by those skilled in the art 
if the primary nucleic acid or amino acid sequence of the 
5 epitope is provided. 

Therefore, an epitope including a peptide requires a 
sequence having a length of at least 3 amino acids, 
preferably at least 4 amino acids, more preferably at least 
10 5 amino acids, at least 6 amino acids, at least 7 amino acids, 
at least 8 amino acids, at least 9 amino acids, at least 
10 amino acids, at least 15 amino acids, at least 20 amino 
acids, and 25 amino acids. Epitopes may be linear or 
conformational . 

15 

As used herein, ^^homology'' of a gene (e.g., a nucleic 
acid sequence, an amino acid sequence, or the like) refers 
to the proportion of identity between two or more gene 
sequences. As used herein, the identity of a sequence (a 

20 nucleic acid sequence, an amino acid sequence, or the like) 
refers to the proportion of the identical sequence (an 
individual nucleic acid, amino acid, or the like) between 
two or more comparable sequences. Therefore, the greater 
the homology between two given genes, the greater the 

25 identity or similarity between their sequences. Whether or 
not two genes have homology is determined by comparing their 
sequences directly or by a hybridization method under 
stringent conditions. When two gene sequences are directly 
compared with each other, these genes have homology if the 

30 DNA sequences of the genes have representatively at least 
50% identity, preferably at least 70% identity, more 
preferably at least 80%, 90%, 95%, 96%, 97%, 98%, or 99% 
identity with each other. 
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The similarity, identity and homology of base 
sequences are herein compared using BLAST (sequence 
analyzing tool) with the default parameters. The 
5 similarity, identity and homology of amino acid sequences 
are herein compared using BLASTX (sequence analyzing tool) 
with the default parameters. 

Amino acids may be referred to herein by either their 
10 commonly known three letter symbols or by the one-letter 
symbols recommended by the lUPAC-IUB Biochemical 
Nomenclature Commission. Nucleotides, likewise, may be 
referred to by their commonly accepted single-letter codes. 

15 As used herein, the ^^percentage of (amino acid, 

nucleotide, or the like) sequence identity, homology or 
similarity" is determined by comparing two optimally 
aligned sequences over a window of comparison, wherein the 
portion of a polynucleotide or polypeptide sequence in the 

20 comparison window may comprise additions or deletions (i.e. 
gaps), as compared to the reference sequences (which does 
not comprise additions or deletions (if the other sequence 
includes an addition, a gap may occur) ) for optimal alignment 
of the two sequences. The percentage is calculated by 

25 determining the number of positions at which the identical 
nucleic acid bases or amino acid residues occur in both 
sequences to yield the number of matched positions, dividing 
the number of matched positions by the total number of 
positions in the reference sequence (i.e. the window size) 

30 and multiplying the results by 100 to yield the percentage 
of sequence identity. When used in a search, homology is 
evaluated by an appropriate technique selected from various 
sequence comparison algorithms and programs well known in 
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the art. Examples of such algorithms and programs include^ 
but are not limited to, TBLASTN, BLASTP, FASTA, TFASTA and 
CLUSTALW (Pearson and Lipman, 1988, Proc. Natl. Acad. Sci. 
USA 85 (8) :2444-2448, Altschul et al . , 1990, J. Mol. Biol. 
5 215 (3) : 403-410, Thompson et al., 1994, Nucleic Acids Res. 
22 (2) :4673-4680, Higgins et al., 1996, Methods Enzymol. 
266:383-402, Altschul et al., 1990, J. Mol. Biol. 
215 (3) : 403-410, Altschul et al . , 1993, Nature Genetics 
3:266-272). In a particularly preferable embodiment, the 

10 homology of a protein or nucleic acid sequence is evaluated 
using a Basic Local Alignment Search Tool (BLAST) well known 
in the art (e.g., see Karlin and Altschul, 1990, Proc. Natl. 
Acad. Sci. USA 87:2267-2268, Altschul et al., 1990, J. Mol. 
Biol. 215:403-410, Altschul et al., 1993, Nature Genetics 

15 3:266-272, Altschul et al . , 1997, Nuc. Acids Res. 
25:3389-3402) . Particularly, 5 specialized-BLAST 

programs may be used to perform the following tasks to 
achieve comparison or search: 

20 (1) comparison of an amino acid query sequence with a protein 
sequence database using BLASTP and BLAST3; 

(2) comparison of a nucleotide query sequence with a 
nucleotide sequence database using BLASTN; 

(3) comparison of a conceptually translated product in 
25 which a nucleotide query sequence (both strands) is 

converted over 6 reading frames with a protein sequence 
database using BLASTX; 

(4) comparison of all protein query sequences converted 
over 6 reading frames (both strands) with a nucleotide 

30 sequence database using TBLASTN; and 

(5) comparison of nucleotide query sequences converted over 
6 reading frames with a nucleotide sequence database using 
TBLASTX . 
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The BLAST program identifies homologous sequences by 
specifying analogous segments called ^^high score segment 
pairs'' between amino acid query sequences or nucleic acid 
5 query sequences and test sequences obtained from preferably 
a protein sequence database or a nucleic acid sequence 
database. A large number of the high score segment pairs 
are preferably identified (aligned) using a scoring matrix 
well known in the art. Preferably, the scoring matrix is 

10 the BLOSUM62 matrix (Gonnet et al., 1992, Science 
256:1443-1445, Henikoff and Henikoff, 1993, Proteins 
17:49-61) . The PAM or PAM250 matrix may be used, although 
they are not as preferable as the BLOSUM62 matrix (e.g., 
see Schwartz and Dayhoff, eds., 1978, Matrices for Detecting 

15 Distance Relationships: Atlas of Protein Sequence and 
Structure, Washington: National Biomedical Research 
Foundation) . The BLAST program evaluates the statistical 
significance of all identified high score segment pairs and 
preferably selects segments which satisfy a threshold level 

20 of significance independently defined by a user, such as 
a user set homology. Preferably, the statistical 
significance of high score segment pairs is evaluated using 
Karlin' s formula (see Karlin and Altschul, 1990, Proc. Natl. 
Acad. Sci. USA 87:2267-2268). 

25 

As used hererin, a sequence is "homologous" refers to 
that the homology thereof is so high that homologous 
recombination occurs. Accordingly, those skilled in the 
art can determine whether a sequence is "homologous" by 
30 introducing a DNA capable of completing a variation in a 
chromosome, and causing in vivo gene recombination. There 
is a method for confirming such a homologous state by 
determining incorporation of a DNA capable of 
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complementation by a phenotype thereof (for example, if a 
green fluorescence protein is used, green fluorescence is 
used) . Accordingly, in order that a sequence be homologous, 
homology between two sequences may be typically at least 
5 about 70 %, preferably at least about 80 %, more preferably 
at least about 90 %, still more preferably at least about 
95 %, and most preferably, at least about 99%. 

As used herein the term "region" of a sequence, is a 
10 portion having a certain length in the sequence. Such a 
region usually has a function. When used for targeting 
disruption of the present invention, the "region" of a 
sequence, is at least about 10 nucleotides in length, 
preferably at least about 15 nucleotides in length, more 
15 preferably at least about 20 nucleotides in length, still 
more preferably at least about 30 nucleotides in length, 
yet more preferably at least about 50 nucleotides in length. 
Preferably, such a region may include a portion responsible 
for genetic function. In a preferable embodiment, the 
20 "region" of a sequence may be one or more genes. 

As used herein the term "targeting" refers to to target 
a certain gene when used in the targeting disruption of a 
gene. 

25 

As used herein the term "biological activity" refers 
to an activity which an agent (for example, a polypeptide 
or protein) may have in the living body, and includes those 
attaining a variety of functions. For example, when an agent 
30 is an enzyme, the biological activity thereof includes the 
enzymatic activity thereof. In another example, when an 
agent is a ligand, the binding thereof to the receptor 
therefor is included. In the present invention, each gene 
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product has the biological activities described in Table 
2. Alternatively^ the polypeptide of the present invention 
has an epitope activity. 

5 As used herein the term ^^marker gene" refers to a gene 

used as a label (or marker) in genetic analysis. Typically, 
marker genes are those having a clear variant phenotype and 
are easily detectable rather than having a detailed function. 
In addition to genes for drug resistance^ genes of 

10 biochemical property (such as auxotrophic) are often used 
in microorganism. Genes for morphological properties may 
also be used. Drug resistance genes include, but are not 
limited to, for example, kanamycin resistance gene, 
hygromycin resistance gene, ampicillin resistance gene, 

15 chloramphenicol resistance gene, streptomycin resistance 
gene, and the like. 

As used herein the term ^Vector" refers to one which 
can transfer a polynucleotide of interest into a cell of 

20 interest. Such a vector includes, but is not limited to, 
for example, one which allows autonomous replication in a 
host cell such as a prokaryotic cell, yeast cell, animal 
cell, plant cell, insect cell, animal individual or plant 
individual or the like, or one which can be incorporated 

25 into the chromosome, and comprises a promoter at an 
appropriate position for trascription of the polynucleotide 
of the present invention. Preferably, such a vector 
includes one which can autonomously replicate in 
Thermococcus kodakarensis KODl. 

30 

As used herein the term ^^expression vector'' refers to 
a nucleic acid sequence which comprises a structural gene 
and a promoter regulating the expression thereof, and a 



4 9 KJ002 



number of regulatory elements operably linked in the host 
cell. Preferably, regulatory elements may comprise a 
terminator, a selective marker such as a drug resistance 
gene (for example, kanamycin resistance gene, hygromycin 
5 resistance gene and the like) , and an enhancer. It is well 
known in the art that the. types of expression vectors used 
in an organism (for example, plant), and the regulatory 
elements used may vary depending on the host cell used. In 
a plant, plant expression vectors used in the present 
10 invention may further have a T-DNA region. The T-DNA region 
enhances the efficiency of introduction of a gene when, in 
particular, Agrobacterium is used to transform the plant. 

As used herein the term ^^recombinant vector" refers 
to a vector which can transfer a polynucleotide of interest 
into a cell of interest. Such a vector includes, but is not 
limited to, for example, one which allows autonomous 
replication in a host cell such as a prokaryotic cell, yeast 
cell, animal cell, plant cell, insect cell, animal 
individual or plant individual or the like, or one which 
can be incorporated into the chromosome, and comprises a 
promoter at an appropriate position for trascription of the 
polynucleotide of the present invention. 

25 ^''Recombinant vectors'' for prokaryotic cells include 

pBTrp2, pBTacl, pBTac2 (both available from Roche Molecular 
Biochemicals) , pKK233-2 (Pharmacia) , pSE280 (Invitrogen) , 
pGEMEX-1 (Promega), pQE-8 (QIAGEN) , pKYPlO (Japanese 
Laid-open Publication No.: 58-110600), pKYP200 

30 (Agric.Biol.Chem. , 48, 669 (1984) ) , pLSAl (Agric. Biol . Chem. , 
53,277 (1989) ) , pGELl (Proc. Natl. Acad. Sci. USA, 82, 4306 
(1985)), pBluescript II SK+ (Stratagene) , pBluescript II 
SK(-) (Stratagene) , pTrs30 (FERM BP-5407), pTrs32 (FERM 
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BP-5408), pGHA2 (FERM BP-400) , pGKA2 (FERM B-6798), 
pTeriti2 (Japanese Laid-Open Publication No.: 3-22979, 
US4686191, US4939094, US5160735) , pEG400 

(J.Bacteriol. , 172,2392 (1990) ) , pGEX (Pharmacia) , pET 
5 systems (Novagen) , pSupex, pUBllO, pTP5, pC194, pTrxFus 
(Invitrogen) , pMAL-c2 (New England Biolabs) , pUC19 
(Gene, 33, 103 (1985) ) , pSTV28 (TaKaRa) , pUC118 (TaKaRa) , 
pPAl (Japanese Laid-Open Publication No.: 63-2337 98), and 
the like. 

10 

As used herein, the term ^^promoter" refers to a base 
sequence which determines the initiation site of 
transcription of a gene and is a DNA region which directly 
regulates the frequency of transcription. Transcription is 

15 started by RNA polymerase binding to a promoter. A promoter 
region is usually located within about 2 kbp upstream of 
the first exon of a putative protein coding region. 
Therefore, it is possible to estimate a promoter region by 
predicting a protein coding region in a genomic base sequence 

20 using DNA analysis software. A putative promoter region is 
usually located upstream of a structural gene, but depending 
on the structural gene, a putative promoter region may be 
located downstream of a structural gene. Preferably, a 
putative promoter region is located within about 2 kbp 

25 upstream of the translation initiation site of the first 
exon, but such a putative promoter region is not limited 
to this and may be located in an intron or downstream of 
3' terminus. 

30 As used herein, the term ^^terminator''' refers to a 

sequence which is located downstream of a protein-encoding 
region of a gene and which is involved in the termination 
of transcription when DNA is transcribed into mRNA, and the 
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addition of a poly-A sequence. 

When using the present invention^ any method for 
introducing a nucleic acid into a cell may be used as methods 
5 for introducing a vector, and includes, for example, 
transf ection, transduction, transformation (calcium 
chloride method, electroporation method (Japanese 
Laid-open Publication 60-251887), particle gun (gene gun) 
method (Japanese Patent Nos. 2606856, and 2517813) 

10 

As used herein, the term ^^transf ormant"' refers to the 
whole or a part of an organism, such as a cell, which is 
produced by transformation. Examples of a transformant 
include prokaryotic cells, yeast cells, animal cells, plant 
15 cells, insect cells and the like. Transf ormants may be 
referred to as transformed cells, transformed tissue, 
transformed hosts, or the like, depending on the subject. 
As used herein, all of the forms are encompassed, however, 
a particular form may be specified in a particular context. 

20 

As used herein the term ^^homologous recombination" 
refers to a recombination in the portion having a homologous 
base sequence in a pair of double stranded DNA. In a living 
organism, such homologous recombinations are observed in 
25 a form of chromosomal crossover and the like. 

As used herein the phrase ^^conditions under which 
homologous recombination occurs" refers to conditions under 
which homologous recombination occurs when an organism 
30 having a genome and a nucleic acid molecule having a sequence 
homologous to at least any one region of the genomic sequence 
thereof, are present. Such conditions may differ depending 
on the organism, and are well known for those skilled in 
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the art. Such conditions include, but are not limited to, 
for example: 

Tk-pyrF deleted strain No. 25, No. 27 are cultured in 20ml 
5 of ASW-YT liquid medium, 
i 

Collect the bacteria from the culture medium (3ml) per one 

sample (No. 25, No. 27, five samples for each) 

i 

10 Suspend the cells in 0.8XASW+80mM CaCl2 200^1/ and let 
stand on ice for 30 minutes 
i 

3/ig pUC118/DS and 3iig pUC118/DD are mixed and let stand 
on ice for 1 hour (two samples for each. Equivalent volume 
15 of TE buffer added sample was used as a control) 

i 

heat shock at 85 **C, 45s 

i 

let stand on ice for 10 minutes 
20 i 

Preculture in Ura-ASW-AA liquid mediiom (proliferation 
occurs based on the incorporated uracil) 

i 

Culture on Ura-ASW-AA liquid medium (enriched for PyrF+ 
25 strain) 

i 

Culture on Ura-ASW~AA solid medium 

The present invention is not limited to the above conditions . 
As used herein the composition of ASW (artificial sea 
30 water) is as follows: 1 x Artificial sea water (ASW) (/L) - 
NaCl 20g ; MgCl2- 6H2O 3g ; MgS04 • 7H2O 6g ; (NH4)2S04 Ig ; NaHCOa 
0 . 2g ; CaCl2 • 2H2O 0 . 3g ; KCl 0 . 5g ; NaBr 0 . 05g ; SrCl2 • 6H2O 0 . 02g ; 
and Fe(NH4) citric acid O.Olg. 
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Homologous recombination may occur when there is at 
least one homologous region between a genome and a vector, 
and preferably, when there are two homologous regions 
5 between the genome and the vector. 

As used herein the term ^^cross-over'' or ^^crossover" , 
when used for a chromosome, refers to a pair of homologous 
chromosomes is crossed in this way, resulting in a new 
10 combination of nucleic acid sequences. 

As used herein the term ^^single cross over", when used 
for chromosome, refers to that there is one homologous region 
causing the cross-over between the nucleic acid molecules , 
15 and cross-over occurs only in that particular region, 
resulting in one nucleic acid sequence thereof that is 
incorporated in the other sequence. 

As used herein the term "double cross-over", when used 
20 for chromosome, refers to that there are two homologous 
regions betweem two nucleic acid molecules for cross-over, 
and the nucleic acid sequence is replaced with each other 
between the homologous regions. 

25 As used herein, the term ^^expression'' of a gene, a 

polynucleotide, a polypeptide, or the like, indicates that 
the gene or the like is affected by a predetermined action 
in vivo to be changed into another form. Preferably, the 
term ^^expression" indicates that genes, polynucleotides, 

30 or the like are transcribed and translated into polypeptides. 
In one embodiment of the present invention, genes may be 
transcribed into mRNA. More preferably, these polypeptides 
may have post-translational processing modifications. 
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As used herein the term ^^expression product" of a gene, 
refers to a substance resulting from expression of the gene, 
and includes mRNA which is a transcription product, a 
5 polypeptide which is a translation product, and a 
polypeptide which is a post-translational product, and the 
like. Detection of such expression products may be directly 
or indirectly performed, and may be performed using a well 
known technology in the art (for example. Southern blotting, 
10 Northern blotting and the like) . These technologies are 
described elsewhere herein, as well as in the references 
cited elsewhere herein. 

Polypeptides used in the present invention may be 

15 produced by, for example, cultivating primary culture cells 
producing the peptides or cell lines thereof, followed by 
separation or purification of the peptides from culture 
supernatant. Alternatively, genetic manipulation 

techniques can be used to incorporate a gene encoding a 

20 polypeptide of interest into an appropriate expression 
vector, transform an expression host with the vector, and 
collect recombinant polypeptides from the culture 
supernatant of the transformed cells. The above-described 
host cell may be any host cells conventionally used in 

25 genetic manipulation techniques as long as they can express 
a polypeptide of interest while keeping the physiological 
activity of the peptide (e.g., E. coli, yeast, an animal 
cell, etc.). Conditions for culturing recombinant host 
cells may be appropriately selected depending on the type 

30 of host cell used- Any host cells which may be used in a 
recombinant DNA technology may be used as a host cell in 
the present invention, including bacterial cells, yeast 
cells, animal cells, plant cells, insect cells, and the like. 
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Preferable host cell is a bacterial cell. Polypeptides 
derived from the thus-obtained cells may have at least one 
amino acid substitution, addition, and/or deletion or at 
least one sugar chain substitution, addition, and/or 
5 deletion as long as they have substantially the same function 
as that of naturally-occurring polypeptides. When an 
expression product is secreted extracellularly, for example, 
the supernatant is obtained by centrifuging or filtering 
a culture, and directly purifying the same or concentrating 

10 by precipitation or ultrafiltration for purification. When 
an expression product is accumulated intracellularly, cells 
may be disrupted by a cell wall lysis enzyme, change in 
osmolarity, use of glass beads, homogenizer, or sonication 
or the like, to obtain cellular extract for purification. 

15 Purification may be performed by combining known methods 
in the art, such as ion sxchange chromatography, gel 
filtration, affinity chromatography, electrophoresis and 
the like. 

2 0 A given amino acid may be substituted with another 

amino acid in a protein structure, such as a cationic region 
or a substrate molecule binding site, without a clear 
reduction or loss of interactive binding ability. A given 
biological function of a protein is defined by the 

25 interactive ability or other property of the protein. 
Therefore, a particular amino acid substitution may be 
performed in an amino acid sequence, or at the DNA code 
sequence level, to produce a protein which maintains the 
original property after the substitution. Therefore, 

30 various modifications of peptides as disclosed herein and 
DNA encoding such peptides may be performed without clear 
losses of biological usefulness. 
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When the above-described modifications are designed, 
the hydrophobicity indices of amino acids may be taken into 
consideration. Hydrophobic amino acid indices play an 
important role in providing a protein with an interactive 
5 biological function, which is generally . recognized in the 
art (Kyte, J. and Doolittle, R.F., J. Mol. Biol. 
157 (1) : 105-132, 1982). The hydrophobic property of an 
amino acid contributes to the secondary structure of a 
protein and then regulates interactions between the protein 

10 and other molecules (e.g., enzymes, substrates, receptors, 
DNA, antibodies, antigens, etc. ) . Each amino acid is given 
a hydrophobicity index based on the hydrophobicity and 
charge properties thereof as follows: isoleucine (+4.5); 
valine (+4.2); leucine (+3.8); phenylalanine (+2.8); 

15 cysteine/cystine (+2.5); methionine (+1.9); alanine 
(+1.8); glycine (-0.4); threonine (-0.7); serine (-0.8); 
tryptophan (-0.9); tyrosine (-1.3); proline (-1.6); 
histidine (-3.2); glutamic acid (-3.5); glutamine (-3.5); 
aspartic acid (-3.5); asparagine (-3.5); lysine (-3.9); and 

20 arginine (-4.5). 

It is well known that if a given amino acid is 
substituted with another amino acid having a similar 
hydrophobicity index, the resultant protein may still have 

25 a biological function similar to that of the original protein 
(e.g., a protein having an equivalent enzymatic activity) . 
For such an amino acid substitution, the hydrophobicity 
index is preferably within +2, more preferably within +1, 
and even more preferably within ±0.5. It is understood in 

30 the art that such an amino acid substitution based on 
hydrophobicity is efficient. 

A hydrophilicity index is also useful for modification 
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of an amino acid sequence of the present invention. As 
described in US Patent No. 4,554^101, amino acid residues 
are given the following hydrophilicity indices: arginine 
(+3.0) ; lysine (+3. 0) ; aspartic acid (+3. 0+1) ; glutamic acid 
5 (+3.0±1); serine (+0.3); asparagine (+0.2); glutamine 
(+0.2); glycine (0); threonine (-0.4); proline (-0.5±1); 
alanine (-0.5); histidine (-0.5); cysteine (-1.0); 
methionine (-1.3); valine (-1.5); leucine (-1.8); 
isoleucine (-1.8); tyrosine (-2.3); phenylalanine (-2.5); 

10 and tryptophan (-3.4) . It is understood that an amino acid 
may be substituted with another amino acid which has a 
similar hydrophilicity index and can still provide a 
biological equivalent. For such an amino acid substitution^ 
the hydrophilicity index is preferably within ±2, more 

15 preferably ± 1, and even more preferably ±0.5. 

The term ^^conservative substitution'' as used herein 
refers to amino acid substitution in which a substituted 
amino acid and a substituting amino acid have similar 

20 hydrophilicity indices or/and hydrophobicity indices. For 
example, the conservative substitution is carried out 
between amino acids having a hydrophilicity or 
hydrophobicity index of within ±2, preferably within ±1, and 
more preferably within ±0.5. Examples of the conservative 

25 substitution include, but are not limited to, substitutions 
within each of the following residue pairs: arginine and 
lysine; glutamic acid and aspartic acid; serine and 
threonine; glutamine and asparagine; and valine, leucine, 
and isoleucine, which are well known to those skilled in 

30 the art. 

As used herein the term ^^silent substitution" refers 
to a substitution in which there are nucleotide sequence 
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substitutions but no amino acid change is encoded by the 
substituted nucleotides. Such silent substitutions may be 
performed using genetic code degeneracy. Such degeneracy 
is well known in the art, and is also described in the 
5 references cited herein. 

As used herein, the term ^''variant'' refers to a 
substance, such as a polypeptide, polynucleotide, or the 
like, which differs partially from the original substance. 

10 Examples of such a variant include a substitution variant, 
an addition variant , a deletion variant, a truncated variant , 
an allelic variant, and the like. Examples of such a variant 
include, but are not limited to, a nucleotide or polypeptide 
having one or several substitutions, additions and/or 

15 deletions or a nucleotide or polypeptide having at least 
one substitution, addition and/or deletion. The term 
^^allele" as used herein refers to a genetic variant located 
at a locus identical to a corresponding gene, where the two 
genes are distinguished from each other. Therefore, the 

20 term ^^allelic variant" as used herein refers to a variant 
which has an allelic relationship with a given gene. Such 
an allelic variant ordinarily has a sequence the same as 
or highly similar to that of the corresponding allele, and 
ordinarily has almost the same biological activity, though 

25 it rarely has different biological activity- The term 
^^species homolog" or ^'homolog'' as used herein refers to one 
that has an amino acid or nucleotide homology with a given 
gene in a given species (preferably at least 60% homology, 
more preferably at least 80%, at least 85%, at least 90%, 

30 and at least 95% homology) . A method for obtaining such a 
species homolog is clearly understood from the description 
of the present specification. The term ^^orthologs'' (also 
called orthologous genes) refers to genes in different 
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species derived from a coiranon ancestry (due to speciation) . 
For example^ in the case of the hemoglobin gene family having 
multigene structure, human and mouse a-hemoglobin genes are 
orthologs, while the human a-hemoglobin gene and the human 
5 p-hemoglobin gene are paralogs (genes arising from gene 
duplication) . Orthologs are useful for estimation of 
molecular phylogenetic trees. Usually, orthologs in 
different species may have a function similar to that of 
the original species. Therefore, orthologs of the present 
10 invention may be useful in the present invention. 

As used herein, the term ^^conservative (or 
conservatively modified) variant" applies to both amino 
acid and nucleic acid sequences. With respect to particular 

15 nucleic acid sequences, conservatively modified variants 
refer to those nucleic acids which encode identical or 
essentially identical amino acid sequences. Because of the 
degeneracy of the genetic code, a large number of 
functionally identical nucleic acids encode any given 

20 protein. For example, the codons GCA, GCC, GCG and GCU all 
encode the amino acid alanine. Thus, at every position 
where an alanine is specified by a codon, the codon can be 
altered to any of the corresponding codons described without 
altering the encoded polypeptide. Such nucleic acid 

25 variations are ""^silent variations" which represent one 
species of conservatively modified variation. Every 
nucleic acid sequence herein which encodes a polypeptide 
also describes every possible silent variation of the 
nucleic acid. Those skilled in the art will recognize that 

30 each codon in a nucleic acid (except AUG, which is ordinarily 
the only codon for methionine, and TGG, which is ordinarily 
the only codon for tryptophan) can be modified to yield a 
functionally identical molecule. Accordingly, each silent 
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variation of a nucleic acid which encodes a polypeptide is 
implicit in each described sequence. Preferably^ such 
modification may be performed while avoiding substitution 
of cysteine which is an amino acid capable of largely 
5 affecting the higher-order structure of a polypeptide. 
Such a conservative modification or silent modification is 
also within the scope of the present invention. 

The above-described nucleic acid can be obtained by 
10 a well-known PGR method, i.e., chemical synthesis- This 
method may be combined with, for example, site-specific 
mutagenesis, hybridization, or the like. 

As used herein, the term ^^substitution, addition or 
15 deletion" for a polypeptide or a polynucleotide refers to 
the substitution, addition or deletion of an amino acid or 
its substitute, or a nucleotide or its substitute, with 
respect to the original polypeptide or polynucleotide, 
respectively. This is achieved by techniques well known in 
20 the art, including a site directed mutagenesis technique 
and the like. A polypeptide or a polynucleotide may have 
any nuiT±>er (>0) of substitutions, additions, or deletions. 
The number can be as large as a variant having such a number 
of substitutions, additions or deletions which maintains 
25 an intended function (e.g., the cancer marker, nervous 
disorder marker, etc.) . For example, such a number may be 
one or several, and preferably within 20% or 10% of the full 
length, or no more than 100, no more than 50, no more than 
25, or the like. 

30 

As used herein, the term ^^specifically expressed" in 
the case of genes indicates that a gene is expressed in a 
specific site or in a specific period of time at a level 
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different from (preferably higher than) that in other sites 
or periods of time. The term ^^specifically expressed" 
includes that a gene may be expressed only in a given site 
(specific site) or may be expressed in other sites. 
5 Preferably^ the term ^^specifically expressed'' indicates 
that a gene is expressed only in a given site. 

General molecular biological technologies which may 
be used in the present invention may be readily performed 
10 by those skilled in the art by referring to for example, 
Ausubel F.A.et al., ed. (1988), Current Protocols in 
Molecular Biology, Wiley, New York, NY;Sambrook J et al. 
(1987) Molecular CloningiA Laboratory Manual, 2nd Ed., Cold 
Spring Harbor Laboratory Press, Cold Spring Harbor, NY. 

15 

As used herein the term "thermostable" refers to a 
property having resistance against a temperture which is 
higher than circumstancial temperature in which a usual 
organism survives, and includes resistance against 

20 temperature higher than 37 °C. More usually, the 
thermostable refers to resistance against temperature 
higher than 50 °C. Thermostable, when used for a living 
organism, may refer to a property thereof in which an 
organism can survive at lower and higher temperatures. On 

25 the other hand, thermostable, when used for a polypeptide, 
refers to resistance against higher temperature, for 
example a temperature higher than 37 °C, a temperature higher 
than 50 '^C. Amongst them, the property of having resistance 
to temperatures higher than 90 °C refers to 

30 "'hyperthermostable'' . 

As used herein, an organism which can survive at higher 
temperature is often called ^^thermophillic bacteria". 
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Thermophillic bacteria usually have survival optimum 
temperatures of 50-105 ^'C and do not grow at 30 **C or lower. 
Amongst them, those having an optimum temperature of 90 °C 
or higher are called ^'hyperthermophillic bacteria". 

5 

As used herein the term "hyperthermophillic 
archeabacteria" and "hyperthermostable bacteria" are 
interchangeably used to refer to a microorganism growing 
at 90 ""C or higher. Preferably, the hyperthermophillic 

10 archeabacteria is Thermococcus kodakaraensis KODl strain, 
a thermostable DNA ligase producing, thermostable thiol 
protease producing bacteria isolated by the present 
inventors (Morikawa, M. et al., Appl . Environ. Microbiol . 
60 (12) , 4559-4566 (1994) ) . KOD-1 strains were deposited in 

15 the International Patent Organism Depositary (Chuo No. 6, 
Higashi 1-Chome, 1-1, Tsukuba-shi, Ibaraki, 305-8566), and 
the accession number there of FERM P-15007. KOD-1 strains 
were originally classified as a Pyrococcus bacteria, as 
described in the above-mentioned reference. However, when 

20 we compared the sequence of 16S rRNA using the registered 
data in GenBank R91.0 Octber, 1995+Daily Update inputted 
in DNASIS (Hitachi Software Engineering) , it was revealed 
that KOD-1 strains belongs to the Thermococcus genus, rather 
than the Pyrococcus genus, and thus is presently classified 

25 as Thermococcus kodakaraensis KOD-1. 

As used herein, culturing hyperthermophillic 
archeabateria producing hyperthermostable proteins may be 
performed under any culture conditions, for example, those 
30 described in Appl. Environ. Microbiol. 60(12), 4559-4566 
(1994) (ibid) . Culture may be either static culture or jar 
fermentation culture by nitrogen gas, and may be either in 
a continuous or batch manner. 
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The chromosomal DNA of a hyperthermophillic 
archeabacteria may be obtained by solubilizing the cultured 
bacterial cells with detergent (for example, N-lauryl 
5 sarcosin) , and fractionating the resultant soluent by 
cesium chloride ethidium bromide equilibrium 
density-gradient centrif ugation (see, for example, Imanaka 
etal., J, Bacteriol. 147 : 77 6-7 8 6 (1981)). Libraries may be 
obtained by digesting the resultant chromosomal DNA by a 
10 variety of restriction enzymes, followed by ligating the 
same into a vector (such as a phage or plasmid) , which has 
been digested with the same restriction enzyme or similar 
restriction enzyme resulting in the same digestion terminus, 
with an enzyme such as T4 DNA ligase or the like. 

15 

Libraries may be screened by selecting a clone 
comprising a DNA encoding a thermophilic DNA ligase of 
interest therefrom. Selection may be performed using an 
oligonucleotide designed based on a partial amino acid 

20 sequence of the predetermined hyperthermophillic DNA ligase 
and a cloned DNA deduced to have homology with the DNA of 
interest as a probe. Alternatively, selection may be 
performed by expressing the enzyme of interest. Detection 
of expression may be performed, for example, when the 

25 activity of the enzyme of interest may be readily detected, 
by detecting the activity of expression product against the 
substrate added to the plate, or alternatively when an 
antibody against the enzyme of interest is available, using 
the reactivity between the expression product and the 

30 antibody. 

Analysis of the resultant cloned DNA may be perf oremd 
by, for example, isolating a selected DNA, producing a 
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restriction map therefor, and determining the nucleotide 
sequence, and the like. Technologies such as preparation 
of a cloned DNA, restriction enzyme processing, subcloning, 
nucleotide sequencing and the like are well known in the 
5 art , and may be performed by referring to "Molecular Cloning : 
A Laboratory Manual Second Edition, " (Sambrook, Fritsch and 
Maniatis ed.. Cold Spring Harbor Laboratory Press, 1989) 

Next, the resultant cloned DNA may be expressed by 
10 operably inserting the same into an expression vector 
applicable to a host cell used, transforming a host cell 
with the expression vector, and culturing the transformed 
host cell. 

15 (Biomolecule chip) 

The genomic information of the present invention may 
be used for providing a biomolecule chip (for example, DNA 
chip, protein chip, glycoprotein chip, antibody chip and 
the like) . 

20 

The analysis of expression control of the genes of the 
present invention may be performed by genetic analysis 
method using a DNA array. The present invention also 
provides a virtual genome DNA array (also called as 
25 ^^hyperthermophillic genomic array^'^) using the genomic 
sequence which has first identified in the present 
invention. 

The nucleotides of the present invention may be used 
30 in a gene analysis method using a DNA array. A DNA array 
is widely reviewed (Shujunsha Ed., Saibo-kogaku (Cellular 
Engineering) , Special issue, 

^'DNA-maikuro-arei-to-saisin-PCR-ho [DNA microarray and 
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Up-to-date PGR Method'') . Further, plant analysis using a 
DNA array has been recently used (Schenk PM et al. (2000) 
Proc. Natl. Acad- Sci. (USA) 97: 11655-11660). Hereinafter, 
a DNA array and a gene analysis method using the same will 
5 be briefly described. 

^'DNA array'' refers to a device in which DNAs are 
arrayed and immobilized on a plate. DNA arrays are divided 
into DNA macroarrays, DNA microarrays, and the like 
10 according to the size of a plate or the density of DNA placed 
on the plate, however, the use of these terms are not strict 
as used herein. 

The border between macro and micro is not strictly 
15 determined. However, generally, DNA macroarray" refers to 
a high density filter in which DNA is spotted on a membrane, 
while ^^DNA microarray" refers to a plate of glass, silicon, 
and the like which carries DNA on a surface thereof. There 
are a cDNA array, an oligoDNA array, and the like according 
20 to the type of DNA placed. 

A certain high density oligoDNA array, in which a 
photolithography technique for production of semiconductor 
integrated circuits is applied and a plurality of oligoDNAs 

25 are simultaneously synthesized on a plate, is particularly 
called ^^DNA chip", an adaptation of the term ^'semiconductor 
chip". Examples of the DNA chip prepared by this method 
include GeneChip® (Af fymetrix, CA) , and the like (Marshall 
A et al., (1998) Nat. Biotechnol. 16: 27-31 and Ramsay G 

30 et al., (1998) Nat. Biotechnol. 16 40-44). Preferably, 
GeneChip® may be used in gene analysis using a microarray 
according to the present invention. The DNA chip is defined 
as described above in a narrow sense, but may refer to all 
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types of DNA arrays or DNA microarrays . 

Thus^ DNA microarrays are a device in which several 
thousands to several ten thousands or more of gene DNAs are 
5 arrayed on a glass plate in high density. Therefore, it is 
possible to analyze gene expression profiles or gene 
polymorphism at a genomic scale by hybridization of cDNA, 
cRNA or genomic DNA. With this technique, it has been made 
possible to analyze a signal transfer system and/or a 

10 transcription control pathway (Fambrough D et al. (1999), 
Cell 97, 727-741) ; the mechanism of tissue repair (Iyer VR 
et al., (1999), Science 283: 83-87); the action mechanism 
of medicaments (Marton MJ, ( 1999) , Nat. Med. 4: 1293-1301); 
fluctuations in gene expression during development and 

15 differentiation processes in a wide scale, and the like; 
identify a gene group whose expression is fluctuated 
according to pathologic conditions; find a novel gene 
involved in a signal transfer system or a transcription 
control; and the like. Further, as to gene polymorphism, 

20 it has been made possible to analyze a number of SNP with 
a single DNA microarray (Cargill M et al . , (1999), Nat. Genet. 
22:231-238) . 

The principle of an assay using a DNA microarray will 
25 be described. DNA microarrays are prepared by immobilizing 
a number of different DNA probes in high density on a 
solid-phase plate, such as a slide glass, whose surface is 
appropriately processed. Thereafter, labeled nucleic 
acids (targets) are subjected to hybridization under 
30 appropriate hybridization conditions, and a signal from 
each probe is detected by an automated detector. The 
resultant data is subjected to massive analysis by a computer . 
For example, in the case of gene monitoring, target cDNAs 
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integrated with fluorescent labels by reverse transcription 
from itiRNA are allowed to hybridize to oligoDNAs or cDNAs 
as a probe on a microarray, and are detected with a 
fluorescence image analyzer. In this case, T7 polymerase 
5 may be used to carry out other various signal amplification 
reactions, such as cRNA synthesis reactions or via enzymatic 
reactions . 

Fodor et al. has developed a technique for 
10 synthesizing polymers on a plate using a combination of 
combinatorial chemistry and photolithography for 
semiconductor production (Fodor SP et al., (1991) Science 
251: 767-773). This is called the synthesized DNA chip. 
Photolithography allows for extremely minute surface 
15 processing, thereby making it possible to produce a DNA 
microarray having a packing density of as high as 10 jxm^/DNA 
sample. In this method, generally, about 25 to about 30 DNAs 
are synthesized on a glass plate. 

20 Gene expression using a synthesized DNA chip was 

reported by Lockart et al. (Lockart DJ et al. (1996) Nat. 
Biotechnol.: 14: 1675-1680). This method overcomes a 
drawback of the chip of this type in that the specificity 
is low since the length of synthesized DNA is short. This 

25 problem was solved by preparing perfect match (PM) 
oligonucleotide probes corresponding to from about 10 to 
about 20 regions and mismatch (MM) oligonucleotide probes 
having a one base mutation in the middle of the PM probes 
for the purpose of monitoring the expression of one gene. 

30 Here, the MM probes are used as an indicator for the 
specificity of hybridization. Based on the signal ratio 
between the PM probe and the MM probe, the level of gene 
expression may be determined. When the signal ratio between 
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the PM probe and the MM probe is substantially 1:1, the result 
is called cross hybridization, which is not interpreted as 
a significant signal. 

5 A so-called attached DNA microarray is prepared by 

attaching DNAs onto a slide glass, and fluorescence is 
detected (see also http://cingm.stanford.edu/pbrown). In 
this method, no gigantic semiconductor production machine 
is required, and only a DNA array machine and a detector 

10 are used to perform the assay in a laboratory. This method 
has the advantage that it is possible to select DNAs to be 
attached. A high density array can be obtained by spotting 
spots having a diameter of 100 |xm at intervals of 100 (xm, 
for example. It is mathematically possible to spot 2500 

15 DNAs per cm^. Therefore, a usual slide glass (the effective 
area is about 4 cm^) can carry about 10,000 DNAs. 

As a labeling method for synthesized DNA arrays, for 
example, double fluorescence labeling is used. In this 

20 method, two different mRNA samples are labeled by different 
fluorescent dyes respectively. The two samples are 
subjected to competitive hybridization on the same 
microarray, and both fluorescences are measured. A 
difference in gene expression is detected by comparing the 

25 fluorescences. Examples of the fluorescent dye include, 
but are not limited to, Cy5 and Cy3, which are most often 
used, and the like. The advantage of Cy3 and Cy5 is that 
the wavelengths of fluorescences do not overlap 
substantially. Double fluorescence labeling may be used to 

30 detect mutations or morphorisms in addition to differences 
in gene expression. 

An array machine may be used for assays using a DNA 
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array. In the array machine, basically, a pin tip or a slide 
holder is moved in directions along the X, Y and Z axes in 
combination with a high-performance servo motor under the 
control of a computer so that DNA samples are transferred 
5 from a microtiter plate to the surface of a slide glass. 
The pin tip is processed into various shapes. For example, 
a DNA solution is retained in a cloven pen tip like a crow's 
bill and spotted onto a plurality of slide glasses. After 
washing and drying cycles, a DNA sample is then placed on 

10 the slide glasses. The above-described steps are repeated. 
In this case, in order to prevent contamination of the pin 
tip by a different sample, the pin tip has to be perfectly 
washed and dried. Examples of such an array machine include 
SPBIO2000 (Hitachi Software Engineering Co., Ltd.; single 

15 strike type), GMS417 Arrayer (Takara Shuzo Co., Ltd.; pin 
ring type) , Gene Tip Stamping (Nippon Laser&Electronics 
Lab.; fountain pen type), and the like. 

There are various DNA immobilizing methods for use in 
20 assays using a DNA array. Glass as a material for a plate 
has a small effective area for immobilization and electrical 
charge amount as compared to membranes, and therefore is 
given various coatings such as poly L-lysine coating 
(Reference 55) , silane finishing (Reference 56) , or the 
25 like. Further, a commercially available precoated slide 
glass exclusive to DNA microarrays (e.g., polycarboimide 
glass (Nissin Spinning Co., Ltd.) and the like) may also 
be used. In the case of oligoDNA, a method of aminating a 
terminal of the DNA and crosslinking the DNA to 
30 silane-f inished glass is available. 

DNA microarrays may carry mainly cDNA fragments 
amplified by PGR. When the concentration of cDNA is 
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insufficient, signals cannot be sufficiently detected in 
some cases. In a case when a sufficient amount of cDNA 
fragments is not obtained by one PGR operation, PGR is 
repeated. The resultant overall PGR products may be 
5 purified and condensed at one time. A probe cDNA may 
generally carry a number of random cDNAs, but may carry a 
group of selected genes (e.g., the gene or promoter groups 
of the present invention) or candidate genes for gene 
expression changes obtained by RDA (representational 
10 differential analysis) according to the purpose of an 
experiment. It is preferable to avoid overlapping clones. 
Glones may be prepared from a stock cDNA library, or cDNA 
clones may be purchased. 

15 In assays using a DNA array, a fluorescent signal 

indicating hybridization on the DNA microarray is detected 
by a fluorescence detector or the like. There are various 
conventionally available detectors for this purpose. For 
example, a research group at the Stanford University has 

20 developed an original scanner which is a combination of a 
fluorescence microscope and a movable stage (see 
http://cmgm.stanford.edu/pbrown) . A conventional 

fluorescence image analyzer for gel, such as FMBIO (Hitachi 
Software Engineering) , Storm (Molecular Dynamics) , and the 

25 like, can read a DNA microarray if the spots are not arrayed 
in very high density. Examples of other available detectors 
include ScanArray 4000 and 5000 (GeneralScanning; scan type 
(confocal type)), GMS418 Array Scanner (Takara Shuzo; scan 
type (confocal type)). Gene Tip Scanner (Nippon 

30 Laser&Electronics Lab.; scan type (non-conf ocal type)). 
Gene Tac 2000 (Genomic Solutions; GGD camera type)), and 
the like. 
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The amount of data obtained from DNA microarrays is 
huge. Software for managing correspondences between clones 
and spots^ analyzing data, and the like is important. Such 
software attached to each detection system is available 
5 (Ermolaeva O et al. (1998) Nat. Genet. 20:19-23). Further, 
an example of a database format is GATC (genetic analysis 
technology consortium) proposed by Affymetrix. 

The present invention may also be used in gene analysis 
10 using a differential display technique. 

The differential display technique is a method for 
detecting or identifying a gene whose expression fluctuates. 
In this method, cDNA is prepared from each of at least two 

15 samples, and amplified by PCR using a set of any primers. 
Thereafter, a plurality of generated PCR products are 
separated by gel electrophoresis. After the 

electrophoresis pattern is produced, 

expression-fluctuating genes are cloned based on a relative 

20 signal strength change between each band. 

The term ^'support" as used herein refers to a material 
for an array construction of the present invention. 
Examples of a material for the substrate include any solid 
25 material having a property of binding to a biomolecule used 
in the present invention either by covalent bond or 
noncovalent bond, or which can be derived in such a manner 
as to have such a property. 

30 Such a material for the substrate may be any material 

capable of forming a solid surface, for example, including, 
but being not limited to, glass, silica, silicon, ceramics, 
silica dioxide, plastics, metals (including alloys). 
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naturally-occurring and synthetic polymer (e.g., 
polystyrene^ cellulose, chitosan, dextran, and nylon). The 
substrate may be formed of a plurality of layers made of 
different materials. For example, an inorganic insulating 
5 material, such as glass, silica glass, alumina, sapphire, 
forsterite, silicon carbide, silicon oxide, silicon nitride, 
or the like, can be used. Moreover, an organic material, 
such as polyethylene, ethylene, polypropylene, 
polyisobutylene, polyethylene terephthalate, unsaturated 

10 polyester, fluorine-containing resin, polyvinyl chloride, 
polyvinylidene chloride, polyvinyl acetate, polyvinyl 
alcohol, polyvinyl acetal, acrylic resin, 

polyacrylonitrile, polystyrene, acetal resin, 

polycarbonate, polyamide, phenol resin, urea resin, epoxy 

15 resin, melamine resin, styrene • acrylonitrile copolymer, 
acrylonitrilebutadienestyrene copolymer, silicone resin, 
polyphenylene oxide, or polysulfone, can be used. In the 
present invention, a film used for nucleic acid blotting, 
such as a nitrocellulose film, a PVDF film, or the like, 

20 can also be used. When material constituting the substrate 
is a solid phase, it is specifically referred to as "solid 
(phase) substrate" as used herein. As used herein such a 
substrate may be a form of plate, microwell plate, chip, 
glass slide, film, bead, metal (surface) and the like. 

25 Substrates may or may not be coated. 

^Xhip'^' as used herein refers to an 
ultramicro-integrated circuit having various functions, 
which constitutes a part of a system. ""^Biomolecule chip'' 
30 as used herein refers to a chip comprising a substrate and 
a biomolecule, in which at least one biomolecule as set forth 
herein is disposed on the substrate. 
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The term ^^address" as used herein refers to a unique 
position on a substrate which can be distinguished from other 
unique positions. An address is suitably used to access a 
biomolecule associated with the address. Any entity 
5 present at each address can have an arbitrary shape which 
allows the entity to be distinguished from entities present 
at other addresses (e.g. , in an optical manner) . The shape 
of an address may be, for example, a circle, an ellipse, 
a square, or a rectangle, or alternatively an irregular 
10 shape. 

The size of each address varies depending on, 
particularly, the size of a substrate, the number of 
addresses on the specific substrate, the amount of samples 

15 to be analyzed and/or an available reagent, the size of a 
biomolecule, and the magnitude of a resolution required for 
any method in which the array is used. The size of an address 
may range from 1-2 nm to several centimeters (e.g., 1-2 mm 
to several centimeters, etc., 125x80 mm, 10x10 mm, etc.). 

20 Any size of an address is possible as long as it matches 
the array to which it is applied. In such a case, a substrate 
material is formed into a size and a shape suitable for a 
specific production process and application of an array. 
For example, in the case of analysis where a large amount 

25 of samples to be measured are available, an array may be 
more economically constructed on a relatively large 
substrate (e.g., 1 cm x 1 cm or more) . Here, a detection 
system which does not require much sensitivity and is 
therefore economical may be further advantageously used. 

30 On the other hand, when the amount of an available sample 
to be analyzed and/or reagent is limited, an array may be 
designed so that consumption of the sample and reagent is 
minimized. 



KJ002 



The spatial arrangement and forms of addresses are 
designed in such a manner as to match a specific application 
in which the microarray is used. Addresses may be densely 
5 loaded, widely distributed, or divided into subgroups in 
a pattern suitable for a specific type of sample to be 
analyzed. "'Array'' as used herein refers to a pattern of 
solid substances fixed on a solid phase surface or a film, 
or a group of molecules having such a pattern. Typically, 

10 an array comprises biomolecules (e.g., DNA, RNA, 
protein-RNA fusion molecules, proteins, low-weight organic 
molecules, etc. ) conjugated to nucleic acid sequences fixed 
on a solid phase surface or a film as if the biomolecule 
captured the nucleic sequence. ^'Spots'' of biomolecules may 

15 be arranged on an array. ^'Spof as used herein refers to 
a predetermined set of biomolecules. 

Any number of addresses may be arranged on a substrate, 
typically up to 10® addresses, in other embodiments up to 

20 10^ addresses, up to 10^ addresses, up to 10^ addresses, up 
to 10^ addresses, up to 10^^ addresses, or up to 10^ addresses. 
Therefore, when one biomolecule is placed on one address, 
up to 10^ biomolecules can be placed on a substrate, and in 
other embodiment up to 10^ biomolecules, up to 10^ 

25 biomolecules, up to 10^ biomolecules, up to 10^ biomolecules, 
up to 10^ biomolecules, or up to 10^ biomolecules can be 
placed on a substrate. In these cases, a smaller size of 
substrate and a smaller size of address are suitable. In 
particular, the size of an address may be as small as the 

30 size of a single biomolecule (i.e., this size may be of the 
order of 1-2 nm) . In some cases, the minimum area of a 
substrate is determined based on the number of addresses 
on the substrate. 
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The term ^^biomolecule" as used herein refers to a 
molecule related to an organism. An ^^organism (or ^^bio-")" 
as used herein refers to a biological organic body, including, 
5 but being limited to, an animal, a plant, a fungus, a virus, 
and the like. A biomolecule includes a molecule extracted 
from an organism, but is not so limited. A biomolecule is 
any molecule capable of having an influence on an organism. 
Therefore, a biomolecule also includes a molecule 

10 synthesized by combinatorial chemistry, and a low weight 
molecule capable of being used as a medicament (e.g., a low 
molecular weight ligand, etc.) as long as they are intended 
to have an influence on an organism. Examples of such a 
biomolecule include, but are not limited to, proteins, 

15 polypeptides, oligopeptides, peptides, polynucleotides, 
oligonucleotides, nucleotides, nucleic acids (e.g., 
including DNA (such as cDNA and genomic DNA) and RNA (such 
as mRNA) ) , polysaccharides, oligosaccharides, lipids, low 
weight molecules (e.g., hormones, ligands, signal 

20 transduction substances, low-weight organic molecules, 
etc.), and complex molecules thereof, and the like. A 
biomolecule also includes a cell itself, and a part or the 
whole of a tissue, and the like as long as they can be coupled 
to a substrate of the present invention. Preferably, a 

25 biomolecule includes a nucleic acid or a protein. In a 
preferable embodiment, a biomolecule is a nucleic acid (e.g., 
genomic DNA or cDNA, or DNA synthesized by PGR or the like) . 
In another preferable embodiment, a biomolecule may be a 
protein. Preferably, one type of biomolecule may be 

30 provided for each address on a substrate of the present 
invention. In another embodiment, a sample containing two 
or more types of biomolecules may be provided for each 
address . 
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As used herein the term ^^liquid phase" is used to mean 
as usually used in the art, and usually refers to a state 
in a solution. 

5 

As used herein the term "solid phase" is used to mean 
as usually used in the art, and usually refers to a state 
in a solid. As used herein liquid and solid collectively 
refer to "fluid". 

10 

As used herein the term "contact" refers to existing 
in a sufficient vicinity distance for interaction between 
two matters (for example, a composition and a cell) to each 
other. 

15 

As used herein the term "interaction" refers, when 
referring to two matters, to that the two matters exert a 
force to each other. Such interaction includes, but is not 
limited to, for example, covalent bonding, hydrpgen bonding, 
20 van der Waals forces, ionic interaction, non-ionic 
interaction, hydrophobic interaction, electrostatic 
interaction and the like. Preferably, the interaction may 
be normal interaction caused in a living body such as 
hydrogen bonding, hydrophobic interaction, and the like. 

25 

In one embodiment, the present invention may produce 
a micoarray for screening for a molecule, by binding a 
library of biomolecules (for example, organic low-molecular 
weight moleculre, combinatorial chemistory products) to a 
30 substrate, and using the same. Chemical library used in the 
present invention, may be produced or obtained by any means 
including ,but is not limited to, for example, by the use ^ 
of combinatorial chemistry technology, fermentation 
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technology, plant and cell extraction procedures and the 
like. Production of a combinatorial library is well known 
in the art. For example, E. R. Felder, Chimia 1994, 48, 
512-541; Gallop et al., J. Med. Chem. 1994, 37, 1233-1251; 
5 R. A. Houghten, Trends Genet. 1993, 9, 235-239; Houghtenet 
al.. Nature 1991, 354, 84-86; Lam et al.. Nature 1991, 354, 
82-84; Carell et al., Chem. Biol. 1995, 3, 171-183; Madden 
etal.. Perspectives in Drug Discovery and Design 2, 269-282; 
Cwirla et al.. Biochemistry 1990, 87, 6378-6382; Brenner 
10 et al. , Proc.Natl. Acad. Sci. USA 1992, 89, 5381-5383; Gordon 
et al., J. Med. Chem. 1994, 37, 1385-1401; Lebl et al., 
Biopolymers 1995, 37 177-198 ; and references cited therein. 
These references are incorporated by reference for their 
entireties 

15 

Methods, biomolecule chips and apparatuses of the 
present invention may be used for, for example, diagnosis, 
forensic medicine, drug discovery (screening for drugs) and 
development, molecular biological analysis (for example, 
2 0 nucleotide sequencing based array and gene sequence 
analysis based on array) , analysis of protein properties 
and functions, pharmacogenomics, proteomics, environmental 
search, and additional biological and chemical analyses. 

25 The present invention can also be applied to 

polymorphism analysis, such as RFLP analysis, SNP (snipp, 
single nucleotide polymorphism) analysis, or the like, 
analysis of base sequences, and the like. The present 
invention can also be used for screening of a medicament. 

30 

The present invention can be applied to any situation 
requiring a biomolecule test other than medical 
applications, such as food testing, quarantine, medicament 
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testing^ forensic medicine, agriculture, husbandry, 
fishery, forestry, and the like. 

The present invention can also be used for detection 
5 of a gene amplified by PGR, SDA, NASBA, or the like, other 
than a sample directly collected from an organism. In the 
present invention, a target gene can be labeled in advance 
with an electrochemically active substance, a fluorescent 
substance (e.g., FITC, rhodamine, acridine, Texas Red, 

10 fluorecein, etc.), an enzyme (e.g., alkaline phosphatase, 
peroxidase, glucose oxidase, etc.), a colloid particle 
(e.g., a hapten, a light-emitting substance, an antibody, 
an antigen, gold colloid, etc.), a metal, a metal ion, a 
metal chelate (e.g., trisbipyridine, trisphenanthroline, 

15 hexamine, etc.), or the like. 

In one embodiment, a nucleic acid component is 
extracted from these samples in order to test the nucleic 
acid. The extraction is not limited to a particular method. 

20 A liquid-liquid extraction method, such as 
phenol-chloroform method and the like, or a liquid-solid 
extraction method using a carrier can be used. 
Alternatively, a commercially available nucleic acid 
extraction method such as QIAamp (QIAGEN, Germany) or the 

25 like can be used. Next, a sample containing an extracted 
nucleic acid component is subjected to a hybridization 
reaction on a biomolecule chip of the present invention. 
The reaction is conducted in a buffer solution having an 
ionic strength of 0.01 to 5 and a pH of 5 to 10. To this 

30 solution may be added dextran sulfate (hybridization 
accelerating agent), salmon sperm DNA, bovine thymus DNA, 
EDTA, a surfactant, or the like. The extracted nucleic acid 
component is added to the solution, followed by heat 
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denaturation at 90*^0 or more. Insertion of a biomolecule 
chip can be carried out immediately after denaturation or 
after rapid cooling to 0°C. Alternatively, a hybridization 
reaction can be conducted by dropping a solution on a 
5 substrate. The rate of a reaction can be increased by 
stirring or shaking during the reaction. The temperature 
of a reaction is in the range of 10°C to 90''C. The time of 
a reaction is in the range of one minute to about one night. 
After a hybridization reaction, an electrode is removed and 
10 then washed. For washing, a buff er solution having an ionic 
strength of 0.01 to 5 and a pH of 5 to 10 can be used. 

^^Label'' as used herein refers to an entity which 
distinguishes an intended molecule or substance from other 

15 substances (e.g., a substance, energy, electromagnetic wave, 
etc.). Examples of such a labeling method include an RI 
(radioisotope) method, a fluorescence method, a biotin 
method, a chemiluminescence method, and the like. When both 
a nucleic acid fragment and its complementary 

20 oligonucleotide are labeled by a fluorescence method, they 
are labeled with fluorescence substances having different 
maximum wavelengths of fluoresence. The difference in the 
maximum wavelength of fluorescence is preferably at least 
10 nm. Any fluorescence substance which can bind to a base 

25 portion of nucleic acid can be used. Preferable 
fluorescence substances include cyanine dye (e.g., Cy3, Cy5, 
etc. in Cy Dye™ series) , a rhodamine 6G reagent, 
N-acetoxy~N2-acetylaminof luorene (AAF) , AAIF (an iodine 
derivative of AAF) , and the like. Examples of a combination 

30 of fluorescence substances having a difference in the 
maximum wavelength of fluorescence of at least 10 nm, 
include a combination of Cy5 and a rhodamine 6G reagent, 
a combination of CyS and fluorescein, a combination of a 
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rhodamine 6G reagent and fluorescein, and the like. 

^^Chip attribute data" as used herein refers to data 
associated with some information relating to a biomolecule 
5 chip of the present invention- Chip attribute data includes 
information associated with a biomolecule chip, such as a 
chip ID, substrate data, and biomolecule attribute data. 
^^Chip ID'' as used herein refers to a code for identification 
of each chip. ^^Substrate data" or ^'substrate attribute 

10 data" as used herein refers to data relating to a substrate 
used in a biomolecule chip of the present invention. 
Substrate data may contain information relating to an 
arrangement or pattern of a biomolecule. '^^Biomolecule 
attribute data" refers to information relating to a 

15 biomolecule, inclding, for example, the gene sequence of 
the biomolecule (a nucleotide sequence in the case of nucleic 
acid, and an amino acid sequence in the case of protein) , 
information relating to a gene sequence (e.g., a 
relationship between the gene and a specific disease or 

20 condition) , a function in the case of a low weight molecule 
or a hormone, library information in the case of a 
combinatorial library, molecular information relating to 
affinity for a low weight molecule, and the like. ^'Personal 
information data" as used herein refers to data associated 

25 with information for identifying an organism or subject to 
be measured by a method, chip or apparatus of the present 
invention. When the organism or subject is a human, 
personal information data includes, but is not limited to, 
age, sex, health condition, medical history (e.g., drug 

30 history) , educational background, the company of your 
insurance, personal genome information, address, name, and 
the like. When the personal information data is for a 
domestic animal, the information may include data about the 
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production company of the animal. ^^Measurement data" as 
used herein refers to raw data as a result of measurement 
by a biomolecule substrate, apparatus and system of the 
present invention and specific processed data derived 
5 therefrom. Such raw data may be represented by the 
intensity of an electric signal- Such processed data may 
be specific biochemical data, such as a blood sugar level 
or a gene expression level. 

10 ^^Recording region'' as used herein refers to a region 

in which data may be recorded. In a recording region, 
measurement data as well as the above-described chip 
attribute data can be recorded. 

15 Techniques as used herein are well known techniques 

commonly used in microf luidics, micromachining, organic 
chemistry, biochemistry, genetic engineering, molecular 
biology, genetics, and their related fields within the 
technical scope of the art, unless otherwise specified. 

20 These techniques are sufficiently described in, for example, 
literature listed below and described elsewhere herein . 

Micromachining is described in, for example, Campbell, 

S. A. (1996) . The Science and Engineering of Microelectronic 
25 Fabrication, Oxford University Press; Zaut, P. V. (1996) . 

Microarray Fabrication: a Practical Guide to Semiconductor 

Processing, Semiconductor Services; Madou, M. J. (1997). 

Fundamentals of Microf abrication, CRCl 5 Press; 

Rai-Choudhury, P. (1997) . Handbook of Microlithography, 
30 Micromachining, &. Microf abrication : Microlithography; and 

the like, related portions of which are herein incorporated 

by reference. 
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Photolithography is a technique developed by Fodor et 
al., in which a photoreactive protecting group is utilized 
(see Science^ 251, 767(1991)). A protecting group for a 
base inhibits a base monomer of the same or different type 
5 from binding to that base. Thus, a base terminus to which 
a protecting group is bound has no new base-binding reaction. 
A protecting group can be easily removed by irradiation. 
Initially, amino groups having a protecting group are 
immobilized throughout a substrate. Thereafter, only spots 

10 to which a desired base is to be bound are selectively 
irradiated by a method similar to a photolithography 
technique usually used in a semiconductor process, so that 
another base can be introduced by subsequent binding into 
only the bases in the irradiated portion. Now, desired 

15 bases having the same protecting group at a terminus thereof 
are bound to such bases. Thereafter, the pattern of a 
photomask is changed, and other spots are selectively 
irradiated. Thereafter, bases having a protecting group 
are similarly bound to the spots. This process is repeated 

20 until a desired base sequence is obtained in each spot, 
thereby preparing a DNA array. Photolithography techniques 
may be herein used. 

An ink: jet method (technique) is a technique of 
25 projecting considerably small droplets onto a predetermined 
position on a two-dimensional plane using heat or a 
piezoelectric effect . This technique is widely used mainly 
in printers. In production of a DNA array, an ink jet 
apparatus is used, which has a configuration in which a 
30 piezoelectric device is combined with a glass capillary. 
A voltage is applied to the piezoelectric device which is 
connected to a liquid chamber, so that the volume of the 
piezoelectric device is changed and the liquid within the 



83 



KJ'002 



chamber is expelled as a droplet from the capillary connected 
to the chamber. The size of the expelled droplet is 
determined by the diameter of the capillary, the volume 
variation of the piezoelectric device, and the physical 
5 property of the liquid. The diameter of the droplet is 
generally 30 \m. An ink jet apparatus using such a 
piezoelectric device can expel droplets at a frequency of 
about 10 KHz. In a DNA array fabricating apparatus using 
such an ink jet apparatus, the ink jet apparatus and a DNA 

10 array substrate are relatively moved so that droplets can 
be dropped onto desired spots on the DNA array. DNA array 
fabricating apparatuses using an ink jet apparatus are 
roughly divided into two categories. One category includes 
a DNA array fabricating apparatus using a single ink jet 

15 apparatus, and the other includes a DNA array fabricating 
apparatus using a multi-head ink jet apparatus. The DNA 
array fabricating apparatus with a single ink jet apparatus 
has a configuration in which a reagent for removing a 
protecting group at a terminus of an oligomer is dropped 

20 onto desired spots. A protecting group is removed from a 
spot, to which a desired base is to be introduced, by using 
the ink jet apparatus so that the spot is activated. 
Thereafter, the desired base is subjected to a binding 
reaction throughout a DNA array. In this case, the desired 

25 base is bound only to spots having an oligomer whose terminus 
is activated by the reagent dropped from the ink jet 
apparatus. Thereafter, the terminus of a newly added base 
is protected. Thereafter, a spot from which a protecting 
group is removed is changed and the procedures are repeated 

30 until desired nucleotide sequences are obtained. On the 
other hand, in a DNA array fabricating apparatus using a 
multi-head ink jet apparatus, an ink jet apparatus is 
provided for each reagent containing a different base, so 
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that a desired base can be bound directly to each spot. A 
DNA array fabricating apparatus using a multi-head ink jet 
apparatus can have a higher throughput than that of a DNA 
array fabricating apparatus using a single ink jet apparatus. 
5 Among methods for fixing a presynthesized oligonucleotide 
to a substrate is a mechanical microspotting technique in 
which liquid containing an oligonucleotide^ which is 
attached to the tip of a stainless pin, is mechanically 
pressed against a substrate so that the oligonucleotide is 
10 immobilized on the substrate. The size of a spot obtained 
by this method is 50 to 300 )Jin- After microspotting, 
subsequent processes, such as immobilization using UV light, 
are carried out. 

15 DESCRIPTION OF PREFERRED EMBODIMENTS 

Hereinafter, preferred embodiments of the present 
invention will be described. The following embodiments are 
provided for a better understanding of the present invention 

20 and the scope of the present invention should not be limited 
to the following description. It will be clearly 
appreciated by those skilled in the art that variations and 
modifications can be made without departing from the scope 
of the present invention with reference to the 

25 specification. 

Next, a novel gene targeted-disruption technique, a 
feature of the present invention, is described. 

30 In one aspect, the present invention provides a 

method for targeted-disuption of an arbitrary gene in a 
genome of a living organism. The subject method comprises 
the steps of: A) providing information of the entire sequence 
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of the genome of the living organism; B) selecting at least 
one arbitrary region of the sequence; C) providing a vector 
comprising a sequence complementary to the selected region 
and a marker gene; D) transforming the living organism with 
5 the vector; and E) placing the living organism in a condition 
allowing to cause homologous recombination. The method is 
first attained by clarifying the entire genomic sequence^ 
and is different from the conventional technology in that^ 
for example, a model system using Sulfolobus solfataricus, 
10 by Bartolucci S., cannot disrupt a desired gene, and can 
merely utilize the result from accidental disruption. In 
the present invention, this difference has attained effects 
which can rapidly disrupt a desired gene in an efficient 
manner, and allow functional anlaysis. 

15 

Preferably, in the step B) of the present invention, 
the region comprises at least two regions- By having two 
such regions, targeted-disruption of genes by double 
cross-over may be available . As demonstrated in the present 
20 invention, targeted-disruption of a gene by double 
cross-over is generally more efficient than 
targeted-disruption of a gene by single cross-over. 
Accordingly, it is preferable to have two such regions. 

25 Vectors used in the present invention, are also called 

disruption vectors, and may further comprise an additional 
gene regulatory element such as a promoter. 

The gene targeting method of the present invention may 
30 further comprise the step of detecting an expression product 
of the marker gene. As used herein, the expression product 
may be for example an mRNA, a polypeptide, or a 
post-translationally modified polypeptide. 
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In one embodiment, the marker gene is located in or 
outside the selected region. 

5 As used herein, the genome used in the present 

invention, may be any genome as long as the entire genomic 
sequence is substantially sequenced. Examples of such a 
genome include, but are not limited to, for example, 
archeabacteria such as Aeropyrum pernix, Archaeoglobus 

10 fulgidus, Methanobacterium thermoautorophicum, 

Methanococcus jannaschii, Pyurococcus abyssi, Pyrococcus 
furiosus, Pyrococcus horikoshii, Sulfolobus solf ataricus, 
Sulfolobus tokodaii, Thermoplasma acidophilum, 

Thermoplasma volcanium; bacteria such as Aquifex aeolicus, 

15 Thermotoga maritima, and the like. In one embodiment, the 
genome used may be the genome of Thermococcus kodakaraensis 
KODl, because the entire genome of Thermococcus 
kodakaraensis KODl has now been sequenced. As used herein, 
that the entire sequence has been sequenced or substantially 

20 . sequenced, refers to that sequences are clarified so that 
for any regional sequence selected, a sufficiently 
homologous region for causing homologous recombination may 
be provided. Accordingly, it is preferable that the entire 
sequence is sequenced without lack of a single base, however, 

25 it is permissible to have one, two, or three bases 
unclarif led in a sequences . A plurality of such unclarif ied 
sequences may be present as long as for any regional sequence 
selected, a region sufficiently homologous for causing 
homologous recombination may be provided. 

30 

Preferably, the genome of the present invention has 
a sequence set forth in SEQ ID NO: 1. 
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Preferably, in the method of the present invention, 
the above-mentioned region selected, is an open reading 
frame of SEQ ID NO; 1, which are selected from the group 
of sequences of gene Numbers (1) to (2151) in the following 
5 Table in the sequence of SEQ ID NO: 1, 342, 723, 1087, 1469 
or 1838. 
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1 1 55948 


533 


981 


933728 


933904 


1155650 


1 1 55474 


534 


982 


933919 


934392 


1155459 


1154986 


1308 



119 



KJ002 



983 


934564 


935379 


1154814 


1153999 


176 


984 


935513 


936664 


1153865 


1152714 


2022 


985 


936666 


936944 


1152712 


1 1 52434 


1696 


986 


936987 


938822 


1152391 


1150556 


1695 


987 


938954 


940192 


1150424 


1149186 


535 


988 


940239 


940469 


1149139 


1148909 


903 


989 


940803 


940937 


1148575 


1148441 


904 


990 


940934 


942055 


1148444 


1147323 


536 


991 


942591 


942917 


1146787 


1146461 


905 


992 


942914 


943306 


1146464 


1146072 


2021 


993 


943357 


943545 


1146021 


1145833 


1307 


994 


943533 


943778 


1145845 


1145600 


1694 


995 


943889 


944536 


1145489 


1144842 


2020 


996 


944542 


944994 


1144836 


1144384 


1306 


997 


944996 


945436 


1144382 


1143942 


2019 


998 


945433 


945741 


1143945 


1143637 


1305 


999 


945755 


946939 


1143623 


1142439 


2018 


1000 


946932 


948164 


1142446 


1141214 


1693 


1001 


948079 


949662 


1141299 


1139716 


1304 


1002 


949659 


953030 


1139719 


1136348 


1692 


1003 


953048 


953296 


1 1 36330 


1136082 


2017 


1004 


953495 


954190 


1135883 


1135188 


2016 


1005 


954301 


955020 


1135077 


1134358 


177 


1006 


955204 


956391 


1134174 


1132987 


178 


1007 


956375 


956533 


1 1 33003 


1132845 


2015 


1008 


957270 


957638 


1132108 


1131740 


906 


1009 


957640 


961329 


1131738 


1128049 


1303 


1010 


961407 


962324 


1127971 


1127054 


907 


1011 


962372 


962575 


1 1 27006 


1126803 


537 


1012 


962593 


963804 


1126785 


1125574 


1302 


1013 


964168 


964827 


1125210 


1124551 


179 


1014 


964831 


965430 


1124547 


1123948 


1301 



120 



KJ002 



1015 


965603 


965896 


1123775 


1123482 


538 


1016 


965901 


966098 


1123477 


1123280 


908 


1017 


966166 


967002 


1123212 


1122376 


180 


1018 


967002 


967181 


1122376 


1122197 


909 


1019 


967184 


967987 


1122194 


1121391 


539 


1020 


968134 


968757 


1121244 


1120621 


181 


1021 


968754 


969002 


1120624 


1120376 


910 


1022 


968995 


969663 


1120383 


1119715 


182 


1023 


969660 


970463 


1119718 


1118915 


911 


1024 


970555 


971892 


1118823 


1 1 1 7486 


183 


1025 


971952 


973340 


1117426 


1116038 


1691 


1026 


973366 


974772 


1116012 


1114606 


1300 


1027 


974823 


976277 


1114555 


1113101 


1690 


1028 


976234 


976803 


1113144 


1112575 


1299 


1029 


976871 


977053 


1112507 


1112325 


2014 


1030 


977082 


977765 


1112296 


1111613 


1689 


1031 


977762 


978706 


1111616 


1110672 


2013 


1032 


978776 


979747 


1110602 


1109631 


540 


1033 


979826 


981100 


1109552 


1108278 


541 


1034 


981159 


981425 


1108219 


1107953 


1688 


1035 


981762 


981815 


1107616 


1107563 


1687 


1036 


982136 


982483 


1107242 


1106895 


542 


1037 


982480 


982953 


1106898 


1106425 


1298 


1038 


983025 


983486 


1106353 


1105892 


912 


1039 


983483 


983821 


1105895 


1105557 


543 


1040 


983802 


984371 


1105576 


1105007 


1686 


1041 


984359 


985399 


1105019 


1103979 


2012 


1042 


985204 


986352 


1104174 


1103026 


1297 


1043 


986349 


986912 


1103029 


1102466 


1685 


1044 


986851 


987246 


1102527 


1102132 


1296 


1045 


987243 


987566 


1102135 


1101812 


1684 


1046 


987517 


988383 


1101861 


1100995 


1295 



121 



KJ002 



1047 


988383 


989573 


1100995 


1099805 


1683 


1048 


989577 


989894 


1099801 


1099484 


1682 


1049 


990762 


991511 


1098616 


1097867 


913 


1050 


991803 


991991 


1097575 


1097387 


914 


1051 


992036 


993010 


1097342 


1096368 


2011 


1052 


994241 


995020 


1095137 


1094358 


544 


1053 


995047 


995112 


1094331 


1094266 


184 


1054 


995380 


995844 


1093998 


1093534 


185 


1055 


995878 


996558 


1093500 


1092820 


1294 


1056 


997037 


998464 


1092341 


1090914 


545 


1057 


998525 


999265 


1090853 


1090113 


2010 


1058 


999750 


1000229 


1089628 


1089149 


915 


1059 


1000226 


1001212 


1089152 


1088166 


546 


1060 


1001217 


1001987 


1088161 


1087391 


916 


1061 


1002002 


1003240 


1087376 


1086138 


2009 


1062 


1003253 


1005466 


1086125 


1083912 


547 


1063 


1005467 


1006087 


1083911 


1083291 


2008 


1064 


1006202 


1007890 


1083176 


1081488 


2007 


1065 


1007979 


1010192 


1081399 


1079186 


1681 


1066 


1010189 


1010956 


1079189 


1078422 


2006 


1067 


1011011 


1011949 


1078367 


1077429 


2005 


1068 


1012013 


1012879 


1077365 


1076499 


548 


1069 


1012961 


1013278 


1076417 


1076100 


549 


1070 


1013371 


1013883 


1076007 


1075495 


186 


1071 


1013995 


1014411 


1075383 


1074967 


1293 


1072 


1014829 


1017228 


1074549 


1072150 


187 


1073 


1017331 


1020711 


1072047 


1068667 


188 


1074 


1020821 


1020970 


1068557 


1068408 


2004 


1075 


1021424 


1022338 


1067954 


1067040 


550 


1076 


1022319 


1023311 


1067059 


1066067 


1680 


1077 


1023301 


1023780 


1066077 


1065598 


1292 


1078 


1023781 


1024785 


1065597 


1064593 


1291 



122 



KJ002 



1079 


1024877 


1025692 


1064501 


1063686 


551 


1080 


1025682 


1026086 


1063696 


1063292 


1679 


1081 


1026083 


1026376 


1063295 


1063002 


2003 


1082 


1026357 


1026986 


1063021 


1062392 


1678 


1083 


1026983 


1027579 


1062395 


1061799 


2002 


1084 


1027657 


1029558 


1061721 


1059820 


189 


1085 


1029517 


1030068 


1059861 


1059310 


1290 


1086 


1030276 


1030950 


1059102 


1058428 


1289 


1087 


1031013 


1031807 


1058365 


1057571 


1677 


1088 


1031814 


1032344 


1057564 


1057034 


1676 


1089 


1032406 


1032792 


1056972 


1056586 


190 


1090 


1032841 


1034373 


1056537 


1055005 


191 


1091 


1034458 


1035498 


1054920 


1053880 


192 


1092 


1035541 


1036101 


1053837 


1053277 


193 


1093 


1036098 


1036649 


1053280 


1052729 


917 


1094 


1036636 


1037469 


1052742 


1051909 


194 


1095 


1037390 


1038229 


1051988 


1051149 


2001 


1096 


1038226 


1039704 


1051152 


1049674 


1288 


1097 


1039796 


1040683 


1049582 


1048695 


552 


1098 


1041012 


1041071 


1048366 


1048307 


918 


1099 


1041624 


1041935 


1047754 


1047443 


919 


1100 


1042133 


1042384 


1047245 


1046994 


553 


1101 


1042526 


1043701 


1046852 


1045677 


554 


1102 


1043676 


1044812 


1045702 


1044566 


1675 


1103 


1044809 


1046068 


1044569 


1043310 


2000 


1104 


1047016 


1048092 


1042362 


1041286 


195 


1105 


1048209 


1048610 


1041169 


1040768 


1674 


1106 


1048684 


1048761 


1040694 


1040617 


1287 


1107 


1048718 


1049599 


1040660 


1039779 


555 


1108 


1049596 


1051275 


1039782 


1038103 


1286 


1109 


1051307 


1051711 


1038071 


1037667 


1999 


1110 


1051708 


1051995 


1037670 


1037383 


1285 



123 



KJ002 



1111 


1052192 


1052701 


1037186 


1036677 


556 


1112 


1052753 


1053022 


1036625 


1036356 


557 


1113 


1053032 


1053793 


1036346 


1035585 


558 


1114 


1053859 


1055274 


1035519 


1034104 


196 


1115 


1055358 


1055663 


1034020 


1033715 


920 


1116 


1056285 


1056395 


1033093 


1032983 


921 


1117 


1056392 


1057381 


1032986 


1031997 


1998 


1118 


1057362 


1057835 


1032016 


1031543 


1673 


1119 


1057832 


1058302 


1031546 


1031076 


1997 


1120 


1058495 


1059043 


1030883 


1030335 


559 


1121 


1059047 


1059307 


1030331 


1030071 


1996 


1122 


1059399 


1059863 


1029979 


1029515 


1672 


1123 


1059921 


1060517 


1029457 


1028861 


922 


1124 


1060582 


1061310 


1028796 


1028068 


197 


1125 


1061307 


1061768 


1028071 


1027610 


1671 


1126 


1061878 


1063221 


1027500 


1026157 


198 


1127 


1063298 


1064599 


1026080 


1024779 


560 


1128 


1064656 


1065000 


1024722 


1024378 


1284 


1129 


1065370 


1066023 


1024008 


1023355 


1283 


1130 


1066020 


1067213 


1023358 


1022165 


1670 


1131 


1067215 


1067811 


1022163 


1021567 


1282 


1132 


1067793 


1068392 


1021585 


1020986 


1669 


1133 


1068394 


1069287 


1020984 


1020091 


1281 


1134 


1069288 


1071138 


1020090 


1018240 


1280 


1135 


1070858 


1070965 


1018520 


1018413 


561 


1136 


1071135 


1072622 


1018243 


1016756 


1668 


1137 


1072619 


1072963 


1016759 


1016415 


1995 


1138 


1072960 


1073688 


1016418 


1015690 


1279 


1139 


1073670 


1073954 


1015708 


1015424 


1667 


1140 


1073951 


1074343 


1015427 


1015035 


1994 


1141 


1074340 


1074594 


1015038 


1014784 


1278 


1142 


1074591 


1075124 


1014787 


1014254 


1666 



124 



KJ002 



1143 


1075360 


1075860 


1014018 


1013518 


1277 


1144 


1076013 


1077278 


1013365 


1012100 


923 


1145 


1077432 


1077986 


1011946 


1011392 


924 


1146 


1078071 


1079189 


1011307 


1010189 


1665 


1147 


1079201 


1080472 


1010177 


1008906 


1993 


1148 


1080723 


1081862 


1008655 


1007516 


925 


1149 


1082285 


1084639 


1007093 


1004739 


562 


1150 


1082363 


1082779 


1007015 


1006599 


1992 


1151 


1084640 


1085716 


1004738 


1003662 


1991 


1152 


1085820 


1086698 


1003558 


1002680 


926 


1153 


1086762 


1086986 


1002616 


1002392 


927 


1154 


1087256 


1088512 


1002122 


1000866 


1990 


1155 


1088568 


1088813 


1000810 


1000565 


1664 


1156 


1088815 


1089384 


1000563 


999994 


1276 


1157 


1089160 


1089210 


1000218 


1000168 


199 


1158 


1089484 


1089639 


999894 


999739 


1275 


1159 


1089909 


1090604 


999469 


998774 


1663 


1160 


1091118 


1091525 


998260 


997853 


1662 


1161 


1091646 


1092197 


997732 


997181 


928 


1162 


1092206 


1093522 


997172 


995856 


1989 


1163 


1093556 


1093957 


995822 


995421 


1988 


1164 


1093967 


1095127 


99541 1 


994251 


1987 


1165 


1096375 


1096839 


993003 


992539 


200 


1166 


1096870 


1098303 


992508 


991075 


201 


1167 


1098281 


1098538 


991097 


990840 


563 


1168 


1098554 


1099156 


990824 


990222 


564 


1169 


1099220 


1099486 


990158 


989892 


565 


1170 


1099468 


1099908 


989910 


989470 


202 


1171 


1099954 


1100991 


989424 


988387 


203 


1172 


1101073 


1101510 


988305 


987868 


1274 


1173 


1101868 


1102326 


987510 


987052 


1273 


1174 


1102786 


1103181 


986592 


986197 


1272 



125 



KJ002 



1175 


1103673 


1104461 


985705 


984917 


1661 


1176 


1104585 


1106492 


984793 


982886 


929 


1177 


1106686 


1107264 


982692 


982114 


1271 


1178 


1107524 


1108015 


981854 


981363 


1986 


1179 


1108559 


1110253 


980819 


979125 


1985 


1180 


1110347 


1111819 


979031 


977559 


566 


1181 


1111862 


1112080 


977516 


977298 


1984 


1182 


1112624 


1113001 


976754 


976377 


1983 


1183 


1113459 


1114217 


975919 


975161 


930 


1184 


1114407 


1117082 


974971 


972296 


931 


1185 


1117577 


1118029 


971801 


971349 


567 


1186 


1118086 


1119738 


971292 


969640 


1270 


1187 


1119840 


1120178 


969538 


969200 


932 


1188 


1120172 


1120504 


969206 


968874 


568 


1189 


1120505 


1121407 


968873 


967971 


569 


1190 


1121408 


1122520 


967970 


966858 


1982 


1191 


1122517 


1123746 


966861 


965632 


1269 


1192 


1123810 


1124472 


965568 


964906 


204 


1193 


1124569 


1125114 


964809 


964264 


1268 


1194 


1125170 


1125637 


964208 


963741 


1981 


1195 


1125727 


1126902 


963651 


962476 


205 


1196 


1128262 


1128495 


961116 


960883 


1267 


1197 


1128535 


1128972 


960843 


960406 


1266 


1198 


1129034 


1130476 


960344 


958902 


1980 


1199 


1130532 


1131944 


958846 


957434 


1660 


1200 


1132006 


1132422 


957372 


956956 


1265 


1201 


1132432 


1132659 


956946 


956719 


1264 


1202 


1132744 


1135125 


956634 


954253 


1263 


1203 


1135154 


1135213 


954224 


954165 


570 


1204 


1135255 


1137741 


954123 


951637 


1262 


1205 


1138634 


1138867 


950744 


95051 1 


571 


1206 


1139159 


1142494 


950219 


946884 


572 



126 



KJ002 



1207 


1142537 


1142836 


946841 


946542 


573 


1208 


1142873 


1144054 


946505 


945324 


574 


1209 


1144054 


1145121 


945324 


944257 


206 


1210 


1145177 


1146514 


944201 


942864 


575 


1211 


1146553 


1148040 


942825 


941338 


207 


1212 


1148086 


1149231 


941292 


940147 


208 


1213 


1150093 


1151094 


939285 


938284 


209 


1214 


1151091 


1154534 


938287 


934844 


1659 


1215 


1155108 


1155464 


934270 


933914 


933 


1216 


1155466 


1155999 


933912 


933379 


1261 


1217 


1157418 


1157627 


931960 


931751 


1658 


1218 


1157624 


1157836 


931754 


931542 


1979 


1219 


1157916 


1158293 


931462 


931085 


1657 


1220 


1158361 


1159554 


931017 


929824 


1260 


1221 


1159686 


1160306 


929692 


929072 


1656 


1222 


1161299 


1161634 


928079 


927744 


1978 


1223 


1161690 


1163606 


927688 


925772 


1655 


1224 


1163703 


1164656 


925675 


924722 


934 


1225 


1164663 


1165082 


924715 


924296 


935 


1226 


1165121 


1165714 


924257 


923664 


576 


1227 


1165724 


1165948 


923654 


923430 


577 


1228 


1165959 


1166231 


923419 


923147 


936 


1229 


1166259 


1166948 


923119 


922430 


937 


1230 


1167001 


1167234 


922377 


922144 


210 


1231 


1167503 


1168657 


921875 


920721 


1977 


1232 


1168678 


1169472 


920700 


919906 


1259 


1233 


1169576 


1171024 


919802 


918354 


1976 


1234 


1171021 


1171905 


918357 


917473 


1258 


1235 


1172047 


1172277 


917331 


917101 


211 


1236 


1172264 


1173025 


917114 


916353 


1975 


1237 


1173022 


1173636 


916356 


915742 


1257 


1238 


1173687 


1174022 


915691 


915356 


938 



127 



KJ002 



1239 


1174023 


1174274 


915355 


915104 


1654 


1240 


1174284 


1174388 


915094 


914990 


1653 


1241 


1174493 


1 1 77870 


914885 


911508 


578 


1242 


1178296 


1178862 


911082 


910516 


212 


1243 


1178840 


1179322 


910538 


910056 


579 


1244 


1179335 


1180606 


910043 


908772 


1974 


1245 


1180603 


1181361 


908775 


908017 


1256 


1246 


1181719 


1181916 


907659 


907462 


1255 


1247 


1182281 


1182673 


907097 


906705 


1973 


1248 


1182899 


1183855 


906479 


905523 


580 


1249 


1184435 


1184731 


904943 


904647 


1972 


1250 


1184832 


1185752 


904546 


903626 


1652 


1251 


1186264 


1186524 


903114 


902854 


1254 


1252 


1187372 


1187653 


902006 


901725 


1971 


1253 


1188250 


1188906 


901128 


900472 


1253 


1254 


1188962 


1189906 


900416 


899472 


1970 


1255 


1189940 


1 1 90062 


899438 


899316 


1969 


1256 


1191309 


1191941 


898069 


897437 


1651 


1257 


1195773 


1195841 


893605 


893537 


939 


1258 


1196421 


1196939 


892957 


892439 


1650 


1259 


1197121 


1197330 


892257 


892048 


1252 


1260 


1197327 


1197827 


892051 


891551 


1649 


1261 


1197859 


1198116 


891519 


891262 


1251 


1262 


1198129 


1198395 


891249 


890983 


1250 


1263 


1198775 


1198969 


890603 


890409 


581 


1264 


1199210 


1199536 


890168 


889842 


1968 


1265 


1200465 


1200542 


888913 


888836 


940 


1266 


1202741 


1 204258 


886637 


885120 


1967 


1267 


1204260 


1 205624 


885118 


883754 


1648 


1268 


1205780 


1207075 


883598 


882303 


1966 


1269 


1207362 


1207793 


882016 


881585 


941 


1270 


1207790 


1208482 


881588 


880896 


582 



128 



KJ002 



1271 


1209464 


1210141 


879914 


879237 


583 


1272 


1210174 


1210893 


879204 


878485 


213 


1273 


1210890 


1211111 


878488 


878267 


942 


1274 


1211128 


1211787 


878250 


877591 


214 


1275 


1211850 


1212755 


877528 


876623 


943 


1276 


1212760 


1213104 


876618 


876274 


1249 


1277 


1213101 


1214369 


876277 


875009 


1647 


1278 


1214366 


1215214 


875012 


874164 


1965 


1279 


1215250 


1215861 


874128 


873517 


1248 


1280 


1217374 


1217490 


872004 


871888 


215 


1281 


1219074 


1219190 


870304 


870188 


944 


1282 


1219197 


1220690 


870181 


868688 


1646 


1283 


1220740 


1221513 


868638 


867865 


1247 


1284 


1221503 


1222201 


867875 


867177 


1964 


1285 


1222282 


1223655 


867096 


865723 


216 


1286 


1223758 


1225113 


865620 


864265 


217 


1287 


1225113 


1225991 


864265 


863387 


945 


1288 


1226169 


1226861 


863209 


862517 


946 


1289 


1227076 


1227702 


862302 


861676 


1246 


1290 


1227756 


1228466 


861622 


860912 


1645 


1291 


1228622 


1230493 


860756 


858885 


584 


1292 


1 230580 


1233081 


858798 


856297 


218 


1293 


1233236 


1234546 


856142 


854832 


585 


1294 


1234563 


1236284 


854815 


853094 


1644 


1295 


1236584 


1237978 


852794 


851400 


1963 


1296 


1237975 


1238376 


851403 


851002 


1245 


1297 


1238433 


1239707 


850945 


849671 


1643 


1298 


1239791 


1239994 


849587 


849384 


1962 


1299 


1240125 


1240214 


849253 


849164 


947 


1300 


1240801 


1240896 


848577 


848482 


1244 


1301 


1241592 


1241921 


847786 


847457 


1642 


1302 


1241983 


1243014 


847395 


846364 


1243 



129 



1^*002 



1303 


1243011 


1243661 


846367 


845717 


1641 


1304 


1243692 


1243778 


845686 


845600 


1640 


1305 


1243775 


1244272 


845603 


845106 


1961 


1306 


1244307 


1244765 


845071 


844613 


1639 


1307 


1244788 


1244973 


844590 


844405 


1242 


1308 


1245004 


1246125 


844374 


843253 


1241 


1309 


1246241 


1247059 


843137 


842319 


1960 


1310 


1247369 


1248709 


842009 


840669 


1959 


1311 


1248621 


1249226 


840757 


840152 


948 


1312 


1250499 


1251188 


838879 


838190 


1638 


1313 


1251193 


1251561 


838185 


837817 


1240 


1314 


1251632 


1253578 


837746 


835800 


1958 


1315 


1253588 


1253788 


835790 


835590 


1957 


1316 


1254304 


1255470 


835074 


833908 


219 


1317 


1 255582 


1256436 


833796 


832942 


1239 


1318 


1256379 


1256846 


832999 


832532 


1637 


1319 


1257402 


1258961 


831976 


830417 


949 


1320 


1258972 


1259079 


830406 


830299 


220 


1321 


1259124 


1259858 


830254 


829520 


950 


1322 


1259855 


1260172 


829523 


829206 


1956 


1323 


1260229 


1262256 


829149 


827122 


1238 


1324 


1262388 


1262651 


826990 


826727 


951 


1325 


1262709 


1264661 


826669 


824717 


952 


1326 


1264658 


1265074 


824720 


824304 


1955 


1327 


1265145 


1265591 


824233 


823787 


953 


1328 


1265593 


1266390 


823785 


822988 


221 


1329 


1266750 


1267955 


822628 


821423 


954 


1330 


1268130 


1269137 


821248 


820241 


1636 


1331 


12691 55 


1270042 


820223 


819336 


1954 


1332 


1270062 


1271162 


819316 


818216 


1635 


1333 


1271162 


1272181 


818216 


817197 


1953 


1334 


1272174 


1273103 


817204 


816275 


1634 



130 
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1335 


1273100 


1274158 


816278 


815220 


1952 


1336 


1274151 


1275281 


815227 


814097 


1633 


1337 


1275461 


1276135 


813917 


813243 


1951 


1338 


1276120 


1276689 


813258 


812689 


1237 


1339 


1276727 


1278301 


812651 


811077 


1950 


1340 


1278636 


1279535 


810742 


809843 


1632 


1341 


1279958 


1280587 


809420 


808791 


1949 


1342 


1280661 


1281740 


808717 


807638 


955 


1343 


1281804 


1282397 


807574 


806981 


1631 


1344 


1282384 


1283034 


806994 


806344 


1236 


1345 


1283055 


1284251 


806323 


805127 


1630 


1346 


1284667 


1285869 


804711 


803509 


222 


1347 


1285975 


1289823 


803403 


799555 


223 


1348 


1290019 


1292922 


799359 


796456 


224 


1349 


1293396 


1293860 


795982 


795518 


1629 


1350 


1294892 


1295722 


794486 


793656 


586 


1351 


1295748 


1297115 


793630 


792263 


956 


1352 


1297116 


1298444 


792262 


790934 


1628 


1353 


1298625 


1298846 


790753 


790532 


957 


1354 


1299189 


1300220 


790189 


789158 


1627 


1355 


1300290 


1301624 


789088 


787754 


1626 


1356 


1301759 


1302934 


787619 


786444 


1948 


1357 


1302931 


1303617 


786447 


785761 


1235 


1358 


1303690 


1304454 


785688 


784924 


1234 


1359 


1304451 


1305239 


784927 


784139 


1625 


1360 


1305236 


1306249 


784142 


783129 


1947 


1361 


1306246 


1306722 


783132 


782656 


1233 


1362 


1306665 


1307039 


782713 


782339 


1624 


1363 


1307076 


1307963 


782302 


781415 


1623 


1364 


1307989 


1309053 


781389 


780325 


1232 


1365 


1309106 


1309948 


780272 


779430 


587 


1366 


1309950 


1311020 


779428 


778358 


958 



131 
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1367 


1311965 


1313317 


777413 


776061 


1946 


1368 


1313412 


1314224 


775966 


775154 


1622 


1369 


1315661 


1315879 


773717 


773499 


1945 


1370 


1316041 


1316151 


773337 


773227 


1231 


1371 


1316410 


1317765 


772968 


771613 


225 


1372 


1317762 


1318001 


771616 


771377 


959 


1373 


1317998 


1318528 


771380 


770850 


588 


1374 


1318585 


1319298 


770793 


770080 


226 


1375 


1319308 


1319637 


770070 


769741 


227 


1376 


1319620 


1320078 


769758 


769300 


1230 


1377 


1321326 


1322096 


768052 


767282 


960 


1378 


1322102 


1322401 


767276 


766977 


1944 


1379 


1322840 


1323004 


766538 


766374 


1943 


1380 


1323183 


1323788 


766195 


765590 


1621 


1381 


1323802 


1324827 


765576 


764551 


1229 


1382 


1325139 


1325336 


764239 


764042 


1620 


1383 


1325369 


1325800 


764009 


763578 


1942 


1384 


1325787 


1326215 


763591 


763163 


1619 


1385 


1326222 


1326593 


763156 


762785 


1618 


1386 


1326738 


1327526 


762640 


761852 


1617 


1387 


1327548 


1327970 


761830 


761408 


1616 


1388 


1327967 


1328509 


761411 


760869 


1941 


1389 


1328520 


1329077 


760858 


760301 


1615 


1390 


1329084 


1329671 


760294 


759707 


1614 


1391 


1330058 


1330213 


759320 


759165 


589 


1392 


1330540 


1331565 


758838 


757813 


1228 


1393 


1331777 


1332007 


757601 


757371 


1940 


1394 


1332043 


1332753 


757335 


756625 


1227 


1395 


1332861 


1333112 


756517 


756266 


1613 


1396 


1333113 


1333694 


756265 


755684 


1612 


1397 


1333706 


1333999 


755672 


755379 


1939 


1398 


1334020 


1334550 


755358 


754828 


1226 



132 
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1399 


1334537 


1335136 


754841 


754242 


1938 


1400 


1335210 


1336667 


754168 


752711 


1611 


1401 


1336699 


1337145 


752679 


752233 


1225 


1402 


1337157 


1337624 


752221 


751754 


1610 


1403 


1337636 


1338343 


751742 


751035 


1937 


1404 


1338340 


1338954 


751038 


750424 


1224 


1405 


1338956 


1339411 


750422 


749967 


1936 


1406 


1339413 


1339793 


749965 


749585 


1609 


1407 


1339810 


1340373 


749568 


749005 


1223 


1408 


1340375 


1340767 


749003 


74861 1 


1935 


1409 


1340779 


1340949 


748599 


748429 


1222 


1410 


1340951 


1341502 


748427 


747876 


1934 


1411 


1341516 


1342247 


747862 


747131 


1608 


1412 


1342247 


1342612 


747131 


746766 


1933 


1413 


1342624 


1343049 


746754 


746329 


1221 


1414 


1343053 


1343406 


746325 


745972 


1220 


1415 


1343394 


1343660 


745984 


745718 


1607 


1416 


1343657 


1343953 


745721 


745425 


1932 


1417 


1343960 


1344160 


745418 


745218 


1931 


1418 


1344147 


1344785 


745231 


744593 


1606 


1419 


1344782 


1345252 


744596 


744126 


1930 


1420 


1345263 


1345673 


7441 1 5 


743705 


1605 


1421 


1345670 


1346398 


743708 


742980 


1929 


1422 


1346403 


1346663 


742975 


742715 


1604 


1423 


1346670 


1347437 


742708 


741941 


1603 


1424 


1 347448 


1348488 


741930 


740890 


1219 


1425 


1348490 


1349344 


740888 


740034 


1928 


1426 


1349882 


1351258 


739496 


738120 


1927 


1427 


1351322 


1352506 


738056 


736872 


1926 


1428 


1352613 


1353269 


736765 


736109 


1602 


1429 


1354574 


1355740 


734804 


733638 


590 


1430 


1355821 


1356402 


733557 


732976 


1218 



133 
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1431 


1356606 


1357514 


732772 


731864 


961 


1432 


1357517 


1358350 


731861 


731028 


1925 


1433 


1358441 


1359433 


730937 


729945 


1924 


1434 


1361181 


1362461 


728197 


726917 


962 


1435 


1362449 


1362523 


726929 


726855 


591 


1436 


1363010 


1363930 


726368 


725448 


1923 


1437 


1363972 


1365465 


725406 


723913 


1217 


1438 


1365589 


1366155 


723789 


723223 


228 


1439 


1366195 


1367346 


723183 


722032 


229 


1440 


1367357 


1368481 


722021 


720897 


592 


1441 


1368582 


1369193 


720796 


720185 


963 


1442 


1369248 


1370567 


720130 


718811 


964 


1443 


1370627 


1370989 


718751 


718389 


1922 


1444 


1371847 


1372125 


717531 


717253 


230 


1445 


1372322 


1373752 


717056 


715626 


593 


1446 


1 373902 


1376664 


715476 


712714 


231 


1447 


1376921 


1378402 


712457 


710976 


594 


1448 


1378470 


1379534 


710908 


709844 


1601 


1449 


1379649 


1380014 


709729 


709364 


965 


1450 


1379981 


1380445 


709397 


708933 


1921 


1451 


1380532 


1381284 


708846 


708094 


1216 


1452 


1381281 


1382687 


708097 


706691 


1600 


1453 


1 382767 


1384572 


706611 


704806 


232 


1454 


1384569 


1385354 


704809 


704024 


1599 


1455 


1385351 


1385914 


704027 


703464 


1920 


1456 


1386061 


1387578 


703317 


701800 


1215 


1457 


1387922 


1388011 


701456 


701367 


595 


1458 


1388004 


1389050 


701374 


700328 


1598 


1459 


1388485 


1388589 


700893 


700789 


233 


1460 


1 389047 


1389982 


700331 


699396 


1919 


1461 


1390108 


1390617 


699270 


698761 


234 


1462 


1390656 


1391165 


698722 


698213 


966 



134 
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1463 


1391397 


1391669 


697981 


697709 


967 


1464 


1393980 


1394540 


695398 


694838 


968 


1465 


1396169 


1396951 


693209 


692427 


596 


1466 


1396965 


1397522 


692413 


691856 


969 


1467 


1397528 


1397968 


691850 


691410 


1918 


1468 


1398271 


1399176 


691107 


690202 


235 


1469 


1399173 


1400693 


690205 


688685 


970 


1470 


1400690 


1401382 


688688 


687996 


597 


1471 


1401502 


1401813 


687876 


687565 


236 


1472 


1401815 


1403806 


687563 


685572 


598 


1473 


1403824 


1404309 


685554 


685069 


237 


1474 


1404349 


1404960 


685029 


684418 


238 


1475 


1404957 


1406060 


684421 


683318 


971 


1476 


1406057 


1406365 


683321 


683013 


599 


1477 


1406372 


1407382 


683006 


681996 


600 


1478 


1407475 


1408257 


681903 


681121 


239 


1479 


1408254 


1409654 


681124 


679724 


972 


1480 


1409674 


1410327 


679704 


679051 


240 


1481 


1410413 


1411189 


678965 


678189 


601 


1482 


1411199 


1411954 


678179 


677424 


602 


1483 


1411938 


1413167 


677440 


676211 


973 


1484 


1413235 


1413960 


676143 


675418 


241 


1485 


1413935 


1414642 


675443 


674736 


603 


1486 


1414943 


1415797 


674435 


673581 


604 


1487 


1415800 


1418658 


673578 


670720 


1214 


1488 


1418655 


1420457 


670723 


668921 


1597 


1489 


1420450 


1420923 


668928 


668455 


1213 


1490 


1421049 


1422080 


668329 


667298 


1596 


1491 


1422217 


1422759 


667161 


666619 


242 


1492 


1422740 


1423594 


666638 


665784 


1917 


1493 


1423617 


1424129 


665761 


665249 


1595 


1494 


1424266 


1424787 


665112 


664591 


243 
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1495 


1424787 


1428260 


664591 


661118 


974 


1496 


1428306 


1428734 


661072 


660644 


975 


1497 


1428842 


1430410 


660536 


658968 


605 


1498 


1430421 


1430807 


658957 


658571 


976 


1499 


1430801 


1431283 


658577 


658095 


606 


1500 


1431290 


1432483 


658088 


656895 


607 


1501 


1432547 


1433398 


656831 


655980 


608 


1502 


1433432 


1434445 


655946 


654933 


609 


1503 


1434874 


1435398 


654504 


653980 


244 


1504 


1435395 


1436108 


653983 


653270 


1594 


1505 


1436180 


1436593 


653198 


652785 


1916 


1506 


1436645 


1436935 


652733 


652443 


1915 


1507 


1436958 


1437776 


652420 


651602 


1593 


1508 


1437769 


1438527 


651609 


650851 


1212 


1509 


1438502 


1439275 


650876 


650103 


1914 


1510 


1439272 


1439982 


650106 


649396 


1211 


1511 


1439994 


1440776 


649384 


648602 


1592 


1512 


1441115 


1441582 


648263 


647796 


610 


1513 


1441557 


1441976 


647821 


647402 


1591 


1514 


1441888 


1442184 


647490 


647194 


1210 


1515 


1442268 


1442525 


647110 


646853 


977 


1516 


1442602 


1444524 


646776 


644854 


245 


1517 


1444521 


1444967 


644857 


64441 1 


1590 


1518 


1445288 


1446001 


644090 


643377 


1913 


1519 


1446421 


1446744 


642957 


642634 


1209 


1520 


1447018 


1447827 


642360 


641551 


246 


1521 


1447763 


1448299 


641615 


641079 


1912 


1522 


1448354 


1448527 


641024 


640851 


1911 


1523 


1448733 


1449227 


640645 


640151 


978 


1524 


1449764 


1450072 


639614 


639306 


611 


1525 


1450076 


1451272 


639302 


638106 


612 


1526 


1451362 


1452348 


638016 


637030 


247 
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1527 


1452345 


1452566 


637033 


636812 


1589 


1528 


1452921 


1453571 


636457 


635807 


1588 


1529 


1453739 


1453954 


635639 


635424 


613 


1530 


1454658 


1454753 


634720 


634625 


1587 


1531 


1455780 


1457495 


633598 


631883 


1586 


1532 


1458373 


1458516 


631005 


630862 


1208 


1533 


1460859 


1461371 


628519 


628007 


1585 


1534 


1461343 


1461726 


628035 


627652 


1207 


1535 


1462494 


1463108 


626884 


626270 


1584 


1536 


1463105 


1464283 


626273 


625095 


1910 


1537 


1464255 


1466492 


625123 


622886 


1583 


1538 


1466599 


1467609 


622779 


621769 


1206 


1539 


1467655 


1467744 


621723 


621634 


248 


1540 


1467769 


1467906 


621609 


621472 


249 


1541 


1467891 


1468676 


621487 


620702 


1582 


1542 


1468498 


1469019 


620880 


620359 


1205 


1543 


1469265 


1470533 


620113 


618845 


979 


1544 


1470609 


1471790 


618769 


617588 


1581 


1545 


1471812 


1471937 


617566 


617441 


1580 


1546 


1471870 


1472673 


617508 


616705 


250 


1547 


1474731 


1474928 


614647 


614450 


1579 


1548 


1475072 


1475983 


614306 


613395 


1909 


1549 


1477107 


1477574 


612271 


611804 


980 


1550 


' 1477584 


1479029 


611794 


610349 


1578 


1551 


1479030 


1479884 


610348 


609494 


1577 


1552 


1480088 


1480873 


609290 


608505 


614 


1553 


1480960 


1481781 


608418 


607597 


1204 


1554 


1481753 


1481869 


607625 


607509 


1908 


1555 


1482049 


1482780 


607329 


606598 


1203 


1556 


1484422 


1486413 


604956 


602965 


251 


1557 


1486448 


1488211 


602930 


601167 


615 


1558 


1488253 


1489308 


601125 


600070 


1202 
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1559 


1489417 


1490157 


599961 


599221 


252 


1560 


1490211 


1490753 


599167 


598625 


981 


1561 


1490896 


1491087 


598482 


598291 


253 


1562 


1491222 


1491395 


598156 
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1869 


1855 


1805521 


1805853 


283857 


283525 


1155 


1856 


1805911 


1806657 


283467 


282721 


1154 


1857 


1806654 


1807073 


282724 


282305 


1516 


1858 


1807161 


1808084 


282217 


281294 


1041 


1859 


1808249 


1808404 


281129 


280974 


664 


1860 


1808394 


1808819 


280984 


280559 


1515 


1861 


1808985 


1811618 


280393 


277760 


1042 


1862 


1811744 


1812487 


277634 


276891 


665 


1863 


1812518 


1813510 


276860 


275868 


1868 


1864 


1813353 


1813550 


276025 


275828 


1043 


1865 


1813638 


1814054 


275740 


275324 


1514 


1866 


1814141 


1814644 


275237 


274734 


1867 


1867 


1814559 


1814648 


274819 


274730 


1044 


1868 


1814829 


1815962 


274549 


273416 


1045 


1869 


1815959 


1817002 


273419 


272376 


666 


1870 


1816999 


1817745 


272379 


271633 


295 


1871 


1817756 


1818715 


271622 


270663 


667 


1872 


1819570 


1819776 


269808 


269602 


1153 


1873 


1820187 


1820936 


269191 


268442 


1513 


1874 


1820961 


1821659 


268417 


267719 


1512 


1875 


1821659 


1821841 


267719 


267537 


1866 


1876 


1822105 


1823073 


267273 


266305 


296 


1877 


1823702 


1823782 


265676 


265596 


1865 


1878 


1823857 


1824675 


265521 


264703 


297 



147 



KJ002 



1879 


1824662 


1825624 


264716 


263754 


1864 


1880 


1825648 


1826151 


263730 


263227 


298 


1881 


1826226 


1826504 


263152 


262874 


1511 


1882 


1826572 


1826886 


262806 


262492 


299 


1883 


1826859 


1827470 


262519 


261908 


1046 


1884 


1827563 


1828408 


261815 


260970 


1863 


1885 


1828493 


1829698 


260885 


259680 


668 


1886 


1829731 


1830558 


259647 


258820 


300 


1887 


1830621 


1831115 


258757 


258263 


1510 


1888 


1831076 


1831645 


258302 


257733 


1862 


1889 


1831699 


1832772 


257679 


256606 


301 


1890 


1832777 


1833709 


256601 


255669 


669 


1891 


1833706 


1834158 


255672 


255220 


1152 


1892 


1834155 


1834856 


255223 


254522 


1509 


1893 


1834992 


1835603 


254386 


253775 


1047 


1894 


1835581 


1836201 


253797 


253177 


302 


1895 


1836239 


1837111 


253139 


252267 


670 


1896 


1837108 


1838508 


252270 


250870 


1151 


1897 


1838515 


1839846 


250863 


249532 


1150 


1898 


1839843 


1842821 


249535 


246557 


1508 


1899 


1842996 


1844864 


246382 


244514 


1507 


1900 


1844947 


1845273 


244431 


244105 


303 


1901 


1845241 


1845942 


244137 


243436 


1149 


1902 


1845932 


1846168 


243446 


243210 


671 


1903 


1846267 


1847184 


2431 1 1 


242194 


1148 


1904 


1847191 


1848111 


242187 


241267 


1147 


1905 


1848117 


1849664 


241261 


239714 


1506 


1906 


1853437 


1853742 


235941 


235636 


1146 


1907 


1853826 


1853894 


235552 


235484 


1048 


1908 


1853933 


1854607 


235445 


234771 


1861 


1909 


1854612 


1855832 


234766 


233546 


1505 


1910 


1855928 


1857586 


233450 


231792 


1860 
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1911 


1857656 


1858012 


231722 


231366 


672 


1912 


1858017 


1859300 


231361 


230078 


1504 


1913 


1859380 


1859607 


229998 


229771 


1145 


1914 


1859695 


1860141 


229683 


229237 


1144 


1915 


1860556 


1860741 


228822 


228637 


1143 


1916 


1860814 


1862100 


228564 


227278 


1142 


1917 


1862097 


1862900 


227281 


226478 


1503 


1918 


1862902 


1863786 


226476 


225592 


1141 


1919 


1863783 


1864895 


225595 


224483 


1502 


1920 


1865656 


1866711 


223722 


222667 


304 


1921 


1866693 


1867223 


222685 


222155 


1049 


1922 


1867473 


1868666 


221905 


220712 


1050 


1923 


1868696 


1869637 


220682 


219741 


673 


1924 


1869643 


1870143 


219735 


219235 


305 


1925 


1870833 


1871861 


218545 


217517 


1051 


1926 


1872015 


1872557 


217363 


216821 


1052 


1927 


1872533 


1872811 


216845 


216567 


674 


1928 


1872808 


1873179 


216570 


216199 


306 


1929 


1873176 


1873442 


216202 


215936 


1053 


1930 


1873439 


1873735 


215939 


215643 


675 


1931 


1873732 


1874181 


215646 


215197 


307 


1932 


1874169 


1874537 


215209 


214841 


1054 


1933 


1874534 


1876078 


214844 


213300 


676 


1934 


1876071 


1876427 


213307 


212951 


1055 


1935 


1876465 


1876995 


212913 


212383 


308 


1936 


1876992 


1877561 


212386 


211817 


1056 


1937 


1877558 


1878838 


211820 


210540 


677 


1938 


1878843 


1879835 


210535 


209543 


1057 


1939 


1879832 


1880263 


209546 


209115 


678 


1940 


1880264 


1880797 


209114 


208581 


1859 


1941 


1880784 


1881278 


208594 


208100 


1501 


1942 


1881271 


1881759 


208107 


207619 


1140 
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1943 


1881790 


1882272 


207588 


207106 


1139 


1944 


1882334 


1883542 


207044 


205836 


679 


1945 


1883543 


1884076 


205835 


205302 


680 


1946 


1884157 


1885149 


205221 


204229 


309 


1947 


1885281 


1886627 


204097 


202751 


1058 


1948 


1886671 


1887270 


202707 


202108 


310 


1949 


1887267 


1887560 


2021 1 1 


201818 


1500 


1950 


1887544 


1888218 


201834 


201160 


1138 


1951 


1888724 


1890025 


200654 


199353 


681 


1952 


1890006 


1890557 


199372 


198821 


1499 


1953 


1890634 


1894026 


198744 


195352 


311 


1954 


1894318 


1894365 


195060 


195013 


312 


1955 


1894442 


1895158 


194936 


194220 


682 


1956 


1895222 


1895692 


194156 


193686 


1858 


1957 


1895730 


1896284 


193648 


193094 


1498 


1958 


1896330 


1896818 


193048 


192560 


1497 


1959 


1896886 


1897806 


192492 


191572 


313 


1960 


1897803 


1898744 


191575 


190634 


1496 


1961 


1898830 


1899255 


190548 


190123 


1137 


1962 


1899309 


1900178 


190069 


189200 


1059 


1963 


1900171 


1900881 


189207 


188497 


1136 


1964 


1901205 


1901720 


188173 


187658 


1495 


1965 


1901783 


1902706 


187595 


186672 


683 


1966 


1902746 


1903273 


186632 


186105 


684 


1967 


1903277 


1904434 


186101 


184944 


685 


1968 


1904431 


1905462 


184947 


183916 


314 


1969 


1905501 


1906337 


183877 


183041 


1060 


1970 


1906334 


1907098 


183044 


182280 


1857 


1971 


1907089 


1908066 


182289 


181312 


1135 


1972 


1908127 


1909461 


181251 


179917 


1134 


1973 


1909517 


1910014 


179861 


179364 


686 


1974 


1910023 


1910727 


179355 


178651 


315 
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1975 


1912010 


1912546 


177368 


176832 


687 


1976 


1912651 


1912902 


176727 


176476 


316 


1977 


1912921 


1913589 


176457 


175789 


1133 


1978 


1913472 


1914050 


175906 


175328 


1494 


1979 


1914387 


1914812 


174991 


174566 


1493 


1980 


1914882 


1916204 


174496 


173174 


1492 


1981 


1916252 


1916479 


173126 


172899 


688 


1982 


1916521 


1917351 


172857 


172027 


317 


1983 


1917310 


1917879 


172068 


171499 


1132 


1984 


1918215 


1918709 


171163 


170669 


1061 


1985 


1918693 


1920390 


170685 


168988 


1131 


1986 


1920429 


1921331 


168949 


168047 


1491 


1987 


1921407 


1923065 


167971 


166313 


1490 


1988 


1923377 


1923970 


166001 


165408 


1856 


1989 


1923967 


1924317 


165411 


165061 


1130 


1990 


1924478 


1926250 


164900 


163128 


689 


1991 


1926252 


1926566 


163126 


162812 


1062 


1992 


1 926707 


1929025 


162671 


1 60353 


690 


1993 


1 929037 


1930491 


160341 


1 58887 


1129 


1994 


1 930573 


1930920 


1 58805 


1 58458 


318 


1995 


1930917 


1931588 


158461 


157790 


1063 


1996 


1931535 


1932002 


157843 


157376 


1489 


1997 


1932193 


1932927 


157185 


156451 


319 


1998 


1932928 


1933236 


156450 


156142 


1128 


1999 


1933306 


1933578 


156072 


155800 


320 


2000 


1933671 


1934051 


155707 


155327 


1064 


2001 


1 934029 


1935735 


155349 


153643 


1127 


2002 


1935745 


1936650 


153633 


152728 


1126 


2003 


1936888 


1937835 


1 52490 


151543 


1125 


2004 


1937965 


1939305 


151413 


1 50073 


1124 


2005 


1941378 


1941863 


148000 


147515 


1065 


2006 


1942184 


1942507 


147194 


146871 


691 
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2007 


1942618 


1944576 


146760 


144802 


1123 


2008 


1944729 


1945865 


144649 


143513 


1488 


2009 


1945993 


1946349 


143385 


143029 


1122 


2010 


1947328 


1948446 


142050 


140932 


321 


2011 


1948368 


1949834 


141010 


139544 


1066 


2012 


1949788 


1951875 


139590 


137503 


1121 


2013 


1951825 


1953192 


137553 


136186 


322 


2014 


1953189 


1954478 


136189 


134900 


1067 


2015 


1954540 


1955208 


134838 


134170 


323 


2016 


1955253 


1957394 


134125 


131984 


1068 


2017 


1957397 


1958206 


131981 


131172 


1855 


2018 


1958454 


1958975 


130924 


130403 


1487 


2019 


1959384 


1959980 


129994 


129398 


1486 


2020 


1959997 


1960209 


129381 


129169 


1120 


2021 


1961911 


1965690 


127467 


123688 


1119 


2022 


1962226 


1962360 


127152 


127018 


324 


2023 


1964567 


1964629 


124811 


124749 


692 


2024 


1965873 


1966658 


123505 


122720 


1069 


2025 


1966899 


1969403 


122479 


119975 


1070 


2026 


1969396 


1970652 


119982 


118726 


325 


2027 


1970804 


1971262 


118574 


118116 


693 


2028 


1971328 


1971672 


118050 


117706 


326 


2029 


1971682 


1972395 


117696 


116983 


327 


2030 


1972493 


1973851 


116885 


115527 


694 


2031 


1974299 


1975357 


115079 


114021 


1854 


2032 


1975695 


1977017 


113683 


112361 


1071 


2033 


1976971 


1977399 


1 1 2407 


111979 


1118 


2034 


1977396 


1977704 


111982 


111674 


1485 


2035 


1977819 


1978400 


111559 


110978 


1484 


2036 


1978397 


1978993 


110981 


110385 


1853 


2037 


1978966 


1979769 


110412 


109609 


1117 


2038 


1979866 


1980489 


109512 


108889 


328 
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2039 


1980484 


1980942 


108894 


108436 


1116 


2040 


1980946 


1981878 


108432 


107500 


1115 


2041 


1981986 


1982897 


107392 


106481 


1072 


2042 


1982894 


1983307 


106484 


106071 


695 


2043 


1983573 


1984325 


105805 


105053 


1483 


2044 


1984369 


1985724 


105009 


103654 


1114 


2045 


1985942 


1987522 


103436 


101856 


696 


2046 


1987535 


1988848 


101843 


100530 


1852 


2047 


1988883 


1989671 


100495 


99707 


1482 


2048 


1989712 


1990701 


99666 


98677 


1113 


2049 


1991043 


1992029 


98335 


97349 


1481 


2050 


1992178 


1993323 


97200 


96055 


1112 


2051 


1993320 


1993928 


96058 


95450 


1480 


2052 


1993956 


1994684 


95422 


94694 


1479 


2053 


1994681 


1995694 


94697 


93684 


1851 


2054 


1995731 


1997062 


93647 


92316 


1850 


2055 


1997062 


1999713 


92316 


89665 


1111 


2056 


1999710 


2001092 


89668 


88286 


1478 


2057 


2001233 


2003020 


88145 


86358 


1849 


2058 


2003136 


2003711 


86242 


85667 


1073 


2059 


2003696 


200421 7 


85682 


85161 


697 


2060 


2004220 


2004576 


85158 


84802 


1110 


2061 


2004890 


2004943 


84488 


84435 


698 


2062 


2005188 


2006615 


84190 


82763 


1477 


2063 


2006536 


2009136 


82842 


80242 


329 


2064 


2009133 


2010641 


80245 


78737 


1074 


2065 


2010697 


2012013 


78681 


77365 


330 


2066 


2012072 


2012314 


77306 


77064 


699 


2067 


2012311 


2012514 


77067 


76864 


1109 


2068 


2012712 


2013572 


76666 


75806 


1476 


2069 


2013609 


2014661 


75769 


74717 


1475 


2070 


2014525 


2015568 


74853 


73810 


1108 
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2071 


2015632 


2016564 


73746 


72814 


1107 


2072 


2016684 


2017421 


72694 


71957 


1075 


2073 


2017378 


2018802 


72000 


70576 


331 


2074 


2019182 


2019406 


70196 


69972 


1848 


2075 


2019763 


2020425 


69615 


68953 


1106 


2076 


2020435 


2021076 


68943 


68302 


1105 


2077 


2021157 


2021522 


68221 


67856 


1076 


2078 


2021495 


2022214 


67883 


67164 


700 


2079 


2022269 


2023111 


67109 


66267 


701 


2080 


2025340 


2025417 


64038 


63961 


332 


2081 


2028631 


2028912 


60747 


60466 


333 


2082 


2028914 


2029489 


60464 


59889 


702 


2083 


2029483 


2030094 


59895 


59284 


1104 


2084 


2030142 


2031023 


59236 


58355 


1474 


2085 


2031138 


2032727 


58240 


56651 


1077 


2086 


2032734 


2033420 


56644 


55958 


1473 


2087 


2033501 


2034466 


55877 


54912 


703 


2088 


2034330 


2035610 


55048 


53768 


1078 


2089 


2035637 


2036254 


53741 


53124 


704 


2090 


2036331 


2036594 


53047 


52784 


1079 


2091 


2036609 


2037244 


52769 


52134 


705 


2092 


2037290 


2038219 


52088 


51159 


706 


2093 


2038219 


2039394 


51159 


49984 


334 


2094 


2039429 


2040040 


49949 


49338 


707 


2095 


2039994 


2040326 


49384 


49052 


1080 


2096 


2040316 


2040816 


49062 


48562 


1103 


2097 


2040797 


2041732 


48581 


47646 


1847 


2098 


2043010 


2044203 


46368 


45175 


1102 


2099 


2044340 


2045170 


45038 


44208 


708 


2100 


2045127 


2046032 


44251 


43346 


1472 


2101 


2046077 


2047399 


43301 


41979 


709 


2102 


2047406 


2047780 


41972 


41598 


710 
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2103 


2047777 


2048313 


41601 


41065 


1101 


2104 


2048320 


2049099 


41058 


40279 


1100 


2105 


2049106 


2049471 


40272 


39907 


1099 


2106 


2050697 


2051614 


38681 


37764 


711 


2107 


2051664 


2051900 


37714 


37478 


1081 


2108 


2051888 


2052298 


37490 


37080 


712 


2109 


2052295 


2053014 


37083 


36364 


335 


2110 


2053125 


2053190 


36253 


36188 


1082 


2111 


2055992 


2057146 


33386 


32232 


1846 


2112 


2057204 


2057467 


32174 


31911 


1845 


2113 


2057477 


2058655 


31901 


30723 


1844 


2114 


2058742 


2059149 


30636 


30229 


1098 


2115 


2059310 


2059501 


30068 


29877 


713 


2116 


2059560 


2060801 


29818 


28577 


1083 


2117 


2060819 


2061598 


28559 


27780 


714 


2118 


2061501 


2061911 


27877 


27467 


1084 


2119 


2061997 


2062446 


27381 


26932 


1097 


2120 


2062448 


2062966 


26930 


26412 


1843 


2121 


2062966 


2063607 


26412 


25771 


1096 


2122 


2063612 


2064214 


25766 


25164 


1842 


2123 


2064280 


2065428 


25098 


23950 


1095 


2124 


2065471 


2066778 


23907 


22600 


1094 


2125 


2066863 


2067558 


22515 


21820 


336 


2126 


2067623 


2068384 


21755 


20994 


715 


2127 


2068384 


2069838 


20994 


19540 


337 


2128 


2069828 


2070184 


19550 


19194 


1841 


2129 


2070189 


2070728 


19189 


18650 


1471 


2130 


2070778 


2071599 


18600 


17779 


1093 


2131 


2071722 


2072069 


17656 


17309 


1085 


2132 


2072066 


2072986 


17312 


16392 


716 


2133 


2073002 


2073490 


16376 


15888 


717 


2134 


2073534 


2073737 


15844 


15641 


1470 
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2135 


2074012 


2075424 


15366 


13954 


338 


2136 


2075557 


2076162 


13821 


13216 


339 


2137 


2076199 


207641 1 


13179 


12967 


1092 


2138 


2076528 


2076959 


12850 


12419 


1086 


2139 


2076986 


2077663 


12392 


11715 


718 


2140 


2077703 


2078152 


11675 


11226 


719 


2141 


2078164 


2078964 


11214 


10414 


1091 


2142 


2079001 


2080026 


10377 


9352 


1090 


2143 


2080319 


2082169 


9059 


7209 


720 


2144 


2082376 


2082897 


7002 


6481 


340 


2145 


2082919 


2083284 


6459 


6094 


1089 


2146 


2083288 


2084007 


6090 


5371 


1088 


2147 


2084057 


2085316 


5321 


4062 


1840 


2148 


2085470 


2087110 


3908 


2268 


721 


2149 


2087216 


2088568 


2162 


810 


1839 


2150 


2088670 


2088921 


708 


457 


341 


2151 


2088905 


2089378 


473 


0 


722 



156 



KJ002 



In one embodiment, such a region is selected from the group 
consisting of genes (1) through (2151) • 

As used herein , in the above Table, translated amino 
5 acid sequences usually start with methionine, and is 
identified as^'amino acid SEQ ID No: Y (SEQ ID NO: 2-341, 
343-722, 724-1086, 1088-1468, 1470-1837, and 1839-2157 ) 
however the other reading frames may also be readily 
translated using known molecular biological techniques . It 
10 is also understood that the polypeptide produced by another 
open reading frame is also encompassed in the scope of the 
present invention . 

The accuracy of the sequence disclosed herein is 
15 sufficient and suitable for a variety of applications well 
known in the art and further described hereinbelow. For 
example, the sequence of the open reading frame region of 
SEQ ID NO: 1 is useful for designing a nucleic acid 
hybridization probe for detection of cDNA contained in the 
20 nucleic acid sequence in the open reading frame. These 
probes also hybridize with a nucleic acid molecule in a 
biological sample, thereby allowing a variety of forensic 
and diagnostic methods of the present invention. Similarly, 
the polypeptide identified by SEQ ID NO: Z may be used for, 
25 for example, producing an antibody specifically binding to 
a protein (including a polypeptide and secreted protein) 
encoded by an open reading frame identified herein. 

Although we have analyzed the sequence of the present 
30 invention with special care, DNA sequences produced by 
sequencing reactions may comprise an error in sequencing. 
This error may be present as an incorrectly identified 
nucleotide, or as an insertion or a deletion of a nucleotide. 
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in the DNA sequence produced. Incorrectly inserted or 
deleted nucleotides cause frame shifts in the deduced amino 
acid sequence of the reading frame. In such cases ^ the 
produced DNA sequences may be identical with more than 99.9% 
5 identity (for example, 1 base insertion or deletion in an 
open reading frame over 1000 bases), but the deduced amino 
acid sequence may differ from the actual amino acid sequence . 

Accordingly, in these applications where accuracy is 
10 required in nucleotide or amino acid sequence, the present 
invention also provides the nucleic acid sequence and the 
amino acid sequence encoded by the genome of Thermococcus 
kodakaraensis KODl of the present invention, which was 
deposisted in the International Patent Organism Depositary 
15 (IPOD) . Those skilled in the art may determine a more 
accurate sequence by sequencing the sequence of the 
deposited Thermococcus kodakaraensis KOD 1 of the present 
invention. What is also provided in the present ivention 
are allelic variants, orthologs, and/or speicies homologs. 

20 

In another aspect, the present invention provides a 
nucleic acid molecule per se having a sequence set forth 
in SEQ ID NO: 1 or 1087. The nucleic acid molecule per se 
is useful in the gene targeting disruption method of the 
25 present invention. 

In another aspect, the present invention provides a 
nucleic acid molecule comprising at least eight contiguous 
nucleic acid sequence of the sequence set forth in SEQ ID 
30 NO: 1 or 1087. 

As used herein, the term ^''probe" refers to a substance 
for use in searching, which is a nucleic acid sequence having 
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a variable length. Probes are variable depending on the use 
thereof. Examples of a nucleic acid molecule as a common 
probe include one having a nucleic acid sequence of at least 
about 8 nucleotides in length, preferably at least about 
5 10 nucleotides, preferably at least about 15 nucleotides, 
preferably at least about 20 nucleotides, preferably at 
least about 30 nucleotides, preferably at least about 40 
nucleotides, preferably at least about 50 nucleotides, 
preferably at least about 100 nucleotides, or may be at least 

10 about 6000 nucleotides. Probes are used for detecting an 
identical, similar or complementary nucleic acid sequence. 
Longer probes may be usually available from natural or 
recombinant sources, are very specific, and hybridize much 
slower than oligomers. Probes may be single- or 

15 double-stranded, and are designed to have specificity in 
technologies such as PGR, membrane based hybridization or 
ELIS and the like. 

As used herein, the term ^^primer" refers to a nucleic 
20 acid sequence having variable length, and serves for 
initiation of elongation of a polynucleotide strand in a 
synthetic reaction of a nucleic acid such as a PGR. Examples 
of a nucleic acid molecule as a common primer include one 
having a nucleic acid sequence having a length of at least 
25 about 6 nucleotides, at least about 7 nucleotides, at least 
about 8 nucleotides, preferably at least about 10 
nucleotides, preferably at least about 15 nucleotides, at 
least about 17 nucleotides, preferably at least about 20 
nucleotides, preferably at least about 30 nucleotides, 
30 preferably at least about 40 nucleotides, preferably at 
least about 50 nucleotides, preferably at least about 100 
nucleotides, or may be at least about 6000 nucleotides. 
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In one aspect, the present invention provides a 
polypeptide having an amino acid sequence selected from a 
group consisting of any Gene ID (1) through (2151) as listed 
in Table 1 (namely, SEQ ID NOs: 2-341, 343-722, 724-1086, 
5 1088-1468, 1470-1837, and 1839-2157 ) . The polypeptide of 
the present invention is preferably fused to another protein. 
These fusion proteins may be used for a variety of 
applications. For example, fusion of His tag, HA tag. 
Protein A, IgG domain and maltose binding protein to the 
10 polypeptide of the present invention facilitates 
purification (see also EP A 394,827, Traunecker et al.. 
Nature, 331:84-86(1988) ) . 

In another aspect, the present invention provides a 

15 peptide molecule comprising at least one amino acid sequence 
of an amino acid sequence selected from a group consisting 
of any Gene ID (1) through (2151) as listed in Table 1 (namely, 
SEQIDNOs: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837, 
and 1839-2157) . Such peptide molecules may be used as an 

20 epitope. Preferably, such a peptide molecule may comprise 
at least about a 4 amino acid sequence, at least about a 
5 amino acid sequence, at least about a 6 amino acid sequence, 
at least about a 7 amino acid sequence, at least about a 
8 amino acid sequence, at least about a 9 amino acid sequence, 

25 at least about a 10 amino acid sequence, at least about a 
15 amino acid sequence, at least about a 20 amino acid 
sequence, at least about a 30 amino acid sequence, at least 
about a 40 amino acid sequence, at least about a 50 amino 
acid sequence, or at least about a 100 amino acid sequence. 

30 The longer the peptide becomes, the higher the specificity 
thereof becomes. 



As used herein the term "epitope" refers to a portion 
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of a polypeptide having antigenicity or immunogenicity in 
an animal, preferably a mammal, and most preferably in a 
human. In a preferable embodiment, the invention comprises 
a polypeptide comprising an epitope, and a polynucleotide 
5 encoding the polypeptide. As used herein the term 
"immunogenic epitope" is defined as a portion of a protein 
inducing antibody reaction in an animal, as determined by 
any method known in the art such as those for producing an 
antibody described herein below (see for example, Geysen 

10 et al., Proc. Natl. Acad. Sci. USA 81:3998-4 002(1983)). As 
used herein the term "antigenic epitope" refers "to a portion 
of a protein capable of binding to an antibody in an 
immunologically specific manner, as determined by any 
method well known in the art, such as an immunoassay as 

15 described herein. Immunologically specific binding 
excludes non-immunological binding, but does not 
necessarily exclude cross-reaction with different antigens. 
Antigenic epitopes are not necessarily immunogenic. 

20 Fragments working as an epitope may be produced in any 

method conventionally known in the art (for example, see 
Houghten, Proc. Natl. Acad. Sci. USA 82:5131-5135(1985); 
see also, US Patent No. 4,631,211). 

25 As used herein an antigenic epitope may comprise 

usually at least three amino acids, preferably at least 4 
amino acids, at least 5 amino acids, at least 6 amino acids, 
at least 7 amino acids, more preferably at least 8 amino 
acids, at least 9 amino acids, at least 10 amino acids, at 

30 least 11 amino acids, at least 12 amino acids, at least 13 
amino acids, at least 14 amino acids, at least 154 amino 
acids, at least 20 amino acids, at least 25 amino acids, 
at least 30 amino acids, at least 40 amino acids, at least 
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50 amino acids, and most preferably comprises a sequence 
of between about 15 amino acids and 30 amino acids. 
Preferable polypeptides comprising an immunogenic epitope 
or antigenic epitope are at least 10, 15, 20, 25, 30, 35, 
5 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95 or 100 amino 
acid residues in length- Still, non-exclusively preferable 
antigenic epitopes comprise antigenic epitopes and a 
portion thereof as disclosed herein. Antigenic epitopes 
are useful for raising an antibody capable of specifically 

10 binding to an epitope (including monoclonal antibodies) . 
Preferable antigenic epitopes comprise any combination of 
the antigenic epitopes as disclosed herein and 2, 3, 4, 5 
or more these antigenic epitopes. Antigenic epitopes may 
be used as a target molecule in an immunoassay (see, for 

15 example, Wilson et al.. Cell 37:767-778(1984); Sutcliffe 
et al.. Science 219: 660-666 (1983)). 

Similarly, with respect to the use of an immunogenic 
epitope, for example, an antibody may be induced according 

20 to a method well known in the art (see, for example, Sutcliffe 
et al . , ( ibid. ) ; Wilson et al . , ( ibid. ) ; Chow et al . , , Proc . 
Natl. Acad. Sci. USA 82:910-914; and Bittle et al., , J. 
Gen. Virol. 66: 2347-2354 (1985)). Preferable immunogenic 
epitopes are those immunogenic epitopes as disclosed herein, 

25 and any combination of two, three, four, five or more of 
these immunogenic epitopes. Polypeptides comprising one or 
more immunogenic epitopes may be presented for raising 
antibody response against an animal system (for example, 
rabbit or mouse) with a carrier protein (for example, 

30 albumin) , or if the polypeptide is sufficiently long (at 
least about 25 amino acids), the polypeptide is presented 
withouth carrier. However, immunogenic epitopes as short 
as 8-10 amino acids have been shown to be sufficient for 
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raising an antibody capable of binding to (at least) a linear 
epitope of a modified polypeptide (for example^ by Western 
blotting) . 

5 Epitope-containing polypeptides of the present 

invention may be used for inducing an antibody according 
to a well known technology in the art . Such a method includes, 
but is not limited to in vivo immunization, in vitro 
immunization, and phage display method. For example, see 

10 Sutcliffe et al. ibid; Wilson et al,, ibid; and Bittle et 
al., J. Gen. Virol., 66: 2347-2354 (1985). When using in 
vivo immunization, an animal may be immunized using a free 
peptide. However, anti-peptide antibody titer may be 
boosted by binding a peptide to a macromolecular carrier 

15 (for example, keyhole limpet hemocyanin (KLH) or tetanus 
toxoid) . For example, a peptide comprising a cysteine 
residue, may be bound to a carrier by the use of a linker 
such as a maleidobenzoyl-N-hydroxysuccineimideester (MBS) . 
On the other hand, another peptide may be bound to a carrier 

2 0 by the use of more general binder such as glutaraldehyde . 
An animal such as a rabbit, rat, or mouse may be immunized 
by peritoneal injection and/or intradermic injection of, 
for example, an emulsion (containing about 100 jl^ g of a 
peptide or carrier protein and Freund' s adjuvant or any other 

25 adjuvant known to stimulate an immunoresponse) . Some 
booster injections may be necessary to provide an effective 
titer of anti-peptide, for example, at about-two week 
intervals. This titer may be detected by an ELISA assay 
using a free peptide absorbed onto a solid surface. Titer 

30 of such anti-peptide antibodies in the serum derived from 
an immunized animal may be enhanced by selecting 
anti-peptide antibodies (for example, by absorption of the 
peptide on a solid support and elution of the selected 
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antibody according to a well known method in the art) . 

As can be understood by those skilled in the art, and 
as discussed hereinabove, the polypeptide of the present 
5 invention comprising an immunogenic or antigenic epitope, 
may be fused to another polypeptide. For example, the 
polypeptide of the present invention may be fused to a 
constant domain or a portion thereof (CHI, CH2, CH3 or any 
combination or fragment thereof) , or albumin (including, 

10 but not limited to, for example, recombinant albumin (see, 
for example, US Patent No. 5, 876, 969 (issued March 2, 1999), 
EP 0 413 622 and US Patent No. 5,766,883 (issued 
June 16, 1998) , which are herein incorporated as reference 
in their entireties) to result in a chimeric protein. Such 

15 a fusion protein may facilitate purification, and enhance 
half-life in vivo. This has been demonstrated for the first 
two domains of a human CD4 -polypeptide, and a chimeric 
protein consisting of a variety of domains from heavy chain 
or light chain constant regions of an immunoglobulin of a 

20 mammal. For example, see EP 394,827; Traunecker et al.. 
Nature, 331: 84-86 (1988) . An enhanced delivery of an 
antigen into the immune system across the epidermal barrier, 
has been demonstrated for an antigen (for example, insulin) 
bound to an IgG or a FcRn binding partner such as Fc fragment 

25 (see, PCT publications WO 96/22024 and WO 99/04812) . IgG 
fusion proteins having a dimeric structure due to disulfide 
bonding of the IgG portions have also been demonstrated to 
be more effective in binding and neutralizing of another 
molecule, than a monomer polypeptide or a fragment thereof 

30 alone. See Fountoulakis et al . , J.Biochem., 270: 3958-3964 
(1995) . A nucleic acid encoding the epitope may be 
recombined as a gene of interest as an epitope tag (for 
example, hemagglutinin "HA" or flag tag) to assist detection 
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and purification of the expressed polypeptide. For example, 
a system described by Janknecht et al., allows simple 
purification of a non-modified fusion protein expressed in 
a human cell line (see Janknecht et al., 1991, Proc. Natl. 
5 Acad. Sci. USA 88: 8972-897). In this system, a gene of 
interest may be subcloned into a vaccinia recombinant 
plasmid to result in fusion of the open reading frame of 
the gene with an amino terminal tag consisting of six 
histidine residues upon translation. This tag functions as 
10 a substrate binding domain for the fusion protein. An 
extract from a cell infected with the recombinant vaccinia 
virus may be loaded onto a Ni2+ nitriloacetate-agarose 
column and a histidine tagged protein may be selectively 
eluted using imidazole containing buffer. 

15 

An "isolated" nucleic acid molecule is separated from 
the other nucleic acid molecules present in the natural 
source of the subject nucleic acid molecule. Examples of 
such isolated nucleic acid molecules include, but are not 

20 limited to, for example, recombinant DNA molecules 
contained in a vector, recombinant DNA molecules maintained 
in a heterologous host cell, nucleic acid molecules 
partially or substantially purified, and synthetic DNA or 
RNA molecules . Preferably, "isolated" nucleic acid is free 

25 of naturally flanking sequences to the subject nucleic acid 
in the genomic DNA of the organism from which the subject 
nucleic acid is derived (i.e., sequences located at 5' and 
3' termini of the subject nucleic acid). For example, in 
a variety of embodiments, isolated novel nucleic acids 

30 molecules may include nucleotide sequence of less than about 
50 kb, 25 kb, 5 kb, 4 kb, • 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 
kb. Further, "isolated" nucleic acid molecules, for 
example, cDNA molecules, may be substantially free of other 
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cellular materials or culture medium when recombinantly 
produced, or of chemical precursors or other chemical 
substances when chemically synthesized. 

5 In one aspect, the present invention provides a 

nucleic acid molecule comprising a sequence encoding an 
amino acid sequence having at least one amino acid sequence 
selected from the group consisting of Gene ID No. 1-2151 
of Table 1 (at least one sequence selected from the group 
10 consisting of SEQ ID NOs : 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157); or a sequence having 
70 % homology thereto. 

In another aspect, the present invention provides a 
polpeptide, having at least one amino acid sequence selected 
from the group consisting of Gene ID No. 1-2151 of Table 
1 (comprising at least one amino acid sequence selected from 
the group consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157), or a sequence having 
at least 70 % homology thereto. 

In another aspect, the present invention provides an 
epitope or a variant thereof, having at least one amino acid 
sequence selected from the group consisting of Gene ID No. 
25 1-2151 of Table 1 (at least one amino acid sequence 
consisting of SEQIDNO: 2-341, 343-722, 724-1086, 1088-1468, 
1470-1837 and 1839-2157) , or a sequence having at least 70 % 
homology thereto, or a portion thereof. 

30 In another aspect, the present invention provides a 

method for screening for a thermostable protein. The 
present method comprises A) providing the entire sequence 
of the genome of a thermoresistant living organism; B) 
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selecting at least one arbitrary region of the sequence; 
C) providing a vector comprising a sequence complementary 
to the selected region and a gene encoding a candidate for 
the heat resistance protein; D) transforming the living 
5 organism with the vector; E) placing the thermoresistant 
living organism in a condition causing possible homologous 
recombination; F) selecting the thermoresistant living 
organism in which homologous recombination has occurred; 
and G) assaying for identifying the thermoresistant protein. 

10 As used herein the entire sequence of the genome may not 
necessarily be a complete sequence, but preferably is an 
entire complete sequence. As used herein, as the selected 
region, two or more regions may be selected. The length of 
the region may be any length, as long as homologous 

15 recombination occurs, and includes, for example, at least 
about 500 bases, at least about 600 bases, at least about 
700 bases, at least about 800 bases, at least about 900 bases, 
at least about 1000 bases, at least about 2000 bases, and 
the like. The candidate for the above thermotable proteins 

20 may be any protein of the present invention, as long as the 
expression thereof is expected. Vectors may be any vector, 
as long as they can express the protein of interest. 

Vectors may preferably comprise gene regulation 
25 elements such as a promoter. Transformation may be any 
condition, as long as it is appropriate therefor. 

Conditions causing homologous recombination may be 
any condition, as long as homologous recombination occurs 
30 under such conditions. Usually, the following condition 
may be used: 

Tk-pyrF deleted strain No. 25, No. 27 are cultured in 20ml 
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of ASW-YT liquid medium. 
i 

Collect the bacteria from the culture medium (3ml) per one 
sample (No. 25, No. 21, five samples for each) 
5 i 

Suspend the cells in 0.8XASW+80mM CaCl2 200 /zl, and let 

stand on ice for 30 minutes 

i 

3/ig pUCllS/DS and 3 /x g pUC118/DD are mixed and let stand 
10 on ice for 1 hour (two samples for each. Equivalent volume 
of TE buffer added to the sample was used as a control) 
i 

heat shock at 85 45s 
i 

15 let stand on ice for 10 minutes 
i 

Preculture in Ura-ASW-AA liquid medium (proliferation 
occurs based on the incorporated uracil) 
i 

20 Culture on Ura-ASW-AA liquid medium (enriched for PyrF-h 
strain) 

i 

Culture on Ura-ASW-AA solid medium 

25 The present invention is not limited to the above-condition. 
As used herein the composition of ASW (artificial sea water) 
is as follows: 1 x Artificial sea water (ASW) (/L) : NaCl 20g ; 
MgCl2 • 6H2O 3g ; MgS04 • 7H2O 6g ; (NH4) 2SO4 Ig ; NaHCOa 0 . 2g ; CaCl2 • 
2H2O 0 . 3g ; KCl 0 . 5g ; NaBr 0 . 05g ; SrCl2 • 6H2O 0 . 02g ; and Fe (NH4) 

30 citric acid 0.01 g. 



A method for selecting an organism in which homologous 
recombination has occurred may be performed by detecting 
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a marker specific for the organism in which homologous 
recombination has occurred. Accordingly, it is preferable 
to use a marker which can be expressed in an organism which 
is expressed upon occurrence of homologous recombination, 
5 in the above-mentioned vector. 

Identification of a thermostable protein may be 
performed by determining that the protein of interest is 
observed to have an activity under the same condition under 
10 which the protein usually attains the activity, but changes 
only the temperature to about 50 °C, preferably to about 
60 "^C, more preferably to about 70 °C, still more preferably 
to about 80 °C, most preferably to about 90 **C. 

15 In another aspect, the present invention provides a 

kit for screening for a thermoresistant protein. The kit 
comprises A) a thermoresistant living organism; and B) a 
vector comprising a sequence complementary to the selected 
region and a gene encoding a candide for the thermoresistant 

20 protein. 

In a preferable embodiment, the thermostable organism 
is a hyperthermophillic archaebacteria, and more preferably, 
Thermococcus kodakaraensis KODl . 

25 

In a preferable embodiment, the kit of the present 
invention further comprises C) an assay system for 
identifying the thermoresistant protein. The assay system 
may vary depending on the activity of the thermostable 
30 protein of interest. 

(Description of each gene) 

Hereinafter, each gene comprised in the genomoic 
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sequence of Thermococcus kodakaraensis KODl strain as 
identified in the present invention, is described. 

(Overview of the genome of hyperthermophillic 
5 bacteria) 

Chromosomal DNA of hyperthermophillic bacteria is 
stable. As double stranded DNA is maintained by hydrogen 
bonds, it is questionable if it will dissociate into single 
strands under higher temperature circumstances. KOD 1 

10 strain has two types of basic histone-like proteins, which 
are stabilized by binding to the DNA, which is negatively 
charged, to form a nucleosome-like complex to be compacted. 
In the present invention, polyamines may be used to further 
enhance stabilization by binding to the same. Acetylated 

15 polyamine (acetyl polyamine) is weak in binding ability to 
the nucleosome-like complex, and thus can more firmly bind 
to polyamine obtained by the action of deacetylated enzyme. 
Generally, hyperthermophillic bacetria have much more 
intracellylar K"^ ion than a normal-temperature bacteria, and 

20 this should contribute to the stabilization of 
double-stranded DNA. Actually, when the melting curve of 
such DNA is observed, this property thereof is clearly 
demonstrated. 

25 (Universality of thermophillic property) 

The present inventors have found universal properties 
in proteins from hyperthermophillic bacteria through 
studies of glutumate dehydrogenase (GDH) of KOD-1 strain. 
That is, it has been demonstrated that proteins from ordinary 

30 temperature bacteria generally denature due to heat, 
whereas recombinant proteins from hyperthermophillic 
bacteria mature once heat is given. GDH synthesized in the 
high temperature circumstances in the KOD-1 strain has a 
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hexamer structure and high specific activity. On the other 
hand, when the GDH gene is expressed in E, coll as a host, 
such GDH has weaker enzymatic activity than a natural form 
thereof, and is a monomer protein having a different 
5 structure. It was demonstrated that when heat treatment at 
70 ""C for twenty minuties was performed, a recombinant GDH 
developed similar specific activity and three-dimensional 
structure of the natural GDH. Once heat treatment is given, 
the present enzyme behaved similarly to the natural GDH 

10 thereof even in the lower temperature range. Such features 
were acknowledged for not only for GDH, but also all the 
enzymes anlayzed by the present inventors from 
hyperthermophillic bacteria. As such, heat is important 
for maturation of thermostable proteins, and was determined 

15 that this is due to irreversible structural change of 
enzymatic proteins by heat. 

(Discovery of enzymes having new structures and 
functions) 

20 Ribulose 1, 5-bisphosphate carboxylase (Rubisco) is 

present in all the plants, algae, and cyanophyte, and plays 
an important role in fixing carbon dioxide to an organic 
material. Rubisco is the most abundant enzyme on earth, and 
is expected to heavily contribute to the solution of global 

25 warming or green house effects, and food problems. To date, 
archeabacteria, which is close to a primordial living 
organism, is believed not to possess a Rubisco, however, 
the present inventors have discovered Rubisco having high 
carbon dioxide fixation ability in the KOD-1 strain. The 

30 present enzyme (Tk-Rubisco) has twenty times greater 
activity than the conventional Rubisco, and the specificity 
to the carbon dioxide is extremely high. Tk-Rubisco is 
novel in terms of structure, and possesses the novel 
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structure of a pentagonal decaitiaer. Presently, the 
analysis of physiological role of the present invention and 
introduction into a plant and the like is performed. 

5 (Analysis of thermostable mechanism of proteins from 

hyperthermophillic bacteria based on three-dimensional 
structure) 

High thermostablility presented by a protein derived 
from hyperthermophillic bacteria is not only from the basic 

10 field of protein sciences but also from a variety of applied 
field using the enzymes. The present inventors have 
clarified a number of three dimensional structures of 
enzymes derived from the KOD-1 strain, and also clarified 
a number of thermostable mechanisms. Typical examples 

15 thereof include O^-methyl guanine-DNA methyl transferase 
(Tk-MGMT) . Comparing the three dimensional structures of 
Tk-MGMT and the same derived from E. coli (AdaC) , it was 
demonstrated that Tk-MGMT has a number of intrahelical ionic 
bond stablizing alpha-helices. Further, there were also a 

20 number of intrahelical ionic bonds stablizing the global 
protein structure- It was shown that AdaC derived from E. 
coli has less such ionic bonds/ and thus the 
hyperthermophillic bacteria derived enzymes attain high 
thermostability by a number of ionic bonds and ionic bond 

25 networks. This is also true of the above-mentioned GDH, and 
also demonstrated biochemically . That is, when introducing 
site-directed mutations disrupting ionic bond networks 
present inside the GDH, thermostability of the variant 
enzyme is greatly reduced. On the other hand, a variant 

30 enzyme with increased ionic bonds enhanced its 
thermostability . 



(Use of useful enzymes) 
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Polymerase chain reaction (PGR) method is an essential 
technology for gene engineering technologies, and the 
application thereof ranges from medicine, environment 
fields, to food industries and the like. Presently, 
5 improvements presently required for PGR methods, are the 
shortening of amplification time, prevention of 
misamplif ication, and the proliferation of long DNA 
fragments. In particular, clinical or food tests require 
rapid and accurate DNA synthesizing DNA polymerases. As a 

10 result of our functional analysis of the DNA polymerase (KOD 
DNA polymerase) from the KOD-1 strain, we found that the 
present enzyme has improved ability of synthesizing a longer 
DNA, and the speed of the synthesis of DNA is increased, 
in comparison of conventional enzymes. In fact, when the 

15 DNA polymerase from the KOD-1 strain is used, reaction time 
for PGR only takes 25 minutes, while the conventional Taq 
enzyme takes two hours. Further, modified enzyme with 
3'->5* exonuclease activity of the KOD DNA polymerase, and 
the wild type enzyme can be mixed in an appropriate ratio 

20 to yield significantly superior reaction efficiency and 
amplification property- Further, the present inventors 
further have attained that an antibody to the KOD DNA 
polymerase is used to suppress mis-amplification which is 
often seen in the initial period of PGR reactions, and thus 

25 could establish an extremely efficient DNA amplification 
system. The present system is now commercially available 
from TOYOBO as "KOD-Plus-" in Japan, and available elsewhere 
thrhough Life Technologies/GIBGO BRL, as "Platinum™ Pfx DNA 
polymerase" including Europe and America. Recently, the 

30 present inventors have further analyzed the KOD DNA 
polymerase to determine the three dimensional structure 
thereof. Detailed three dimensional structure could be 
analyzed with respect to the speed of elongation reaction 
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of the present enzyme, accuracy of the replication 
capability and the like, in view of what the structure is 
related to. 

5 The present inventors have identified and analyzed a 

number of useful thermophillic enzymer other than DNA 
polymerases. DNA ligases catalyze reaction of binding 
termini of two DNA fragments, and thus are essential enzymes 
for genetic engineering. Most conventional enzymes from 

10 bacteria and phages are sensitive to heat and unstable. 
However, the DNA ligase f rom KOD-1 strain (Tk-Lig) presented 
high DNA ligase activity from 30-100 °C. Further, substrate 
specificity in Nick-site of Tk-Lig (base-pairing) was 
interesting, and it was turned out that it was necessary 

15 to form accurate base-pairing against the 3* terminus, while 
substrate specificity was loose against the 5' terminus. 
No such DNA ligases having such features are reported to 
date, and these are expected to be applicable for detection 
of single nucleotide polymorphisms (SNPs) . Sugar-related 

20 enzymes identified with respect to biochemical properties 
include alpha-amylase digesting alpha (1-4 ) bond as appears 
in starch and the like, or cyclodextrin glucanotransf erase 
synthesizing cyclodextrine which catalyzes circulation, 
and 4-alpha-glucanotransf erase, catalyzing a transferase 

25 reaction- Beta-glucosidase, which digests beta ( 1-4 ) bonds , 
appears in cellulose and chitin, and chitinase were also 
analyzed in detail. Two chitinase activities are present 
on the same polypeptide chain in chitinase from the KOD-1 
strain, and one is responsable for endochitinase activity, 

30 while the other is responsable for exochitinase activity. 
These catalytic domains attain extremely high chitin 
degrading activity by synergy. 
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(Genomic analysis of Thermococcus kodakaraensis 
KOD-1 strain and Development of gene introduction 
technology) 

Through the present studies, the present inventors 
5 have analyzed substantially all the genes relating to the 
KOD-1 strain, and revealed detailed biochemical properties 
of a huge variety of proteins. KOD-1 strain is a simple 
organism, located in the vicinity of the bottom of the 
evolutionary tree of organisms, and thus is believed to be 

10 a good tool for understanding basic mechanisms of life. 
Further, the KOD-1 strain produces a number of thermostable 
enzymes with broad applicability or novel enzymes with novel 
features as described above. Having such as background, the 
present inventors have proceeded with the entire genomic 

15 analysis of the KOD-1 strain. The genome of the KOD-1 strain 
consists of 2,076,138 base pairs, and is very short, as we 
have expected (40 % or less of that of E. coll) . Further, 
there were about 1, 500 genes. As the KOD-1 strain maintains 
its life with such low number of genes, it is expected to 

20 allow analysis of basic principle of life through the 
research of the present bacteria. 

The most important object of research in the 
post-genomic era is to analyze the physiological role of 

25 unknown genes. Exhaustive gene expression analysis by DNA 
chips, and exhaustive protein analysis by proteomics are 
effective analysis methods for these purposes. The present 
inventors have proceeded using these methods, and recently, 
have succeeded in constructing a novel system, which is an 

30 important new technology for specifically disrupting any 
gene of interest on the genome of the KOD-1 strain. This 
technology is used to disrupt a functionally-known gene to 
allow analysis and clarif if cation of the physiological role 
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thereof. 



Genes comprised in the genome of KODl encompass a 
variety of species as listed in Table 2 below. Description 
5 of such genes are described in biochemistry references well 
known in the art, such as Sambrook, J. et al. Molecular 
Cloning: A Laboratory Manual, 3rd Ed. Cold Spring Harbor 
Laboratory Press, Cold Spring 

Harbor, NY, USA(2001) ; Ausubel , F. et al. , Short protocols in 

10 molecular biology, 4th ed.John 

Wiley&Sons,NJ, USA (1999) ;Ausubel, F. , et al. , Current 

Protocols in Molecular Biology, John 

Wiley&Sons, NJ, USA ( 1988 ) ; Jiro Ota ed.. Biochemistry 
Handbook, Asakura Shoten, (1987); Kazutomo Imabori, Tamio 

15 Yamakawaed., Seikagaku Jiten (Dictionary of BIOCHEMISTRY) , 
Third Edition, Tokyo Kagaku Dojin (1998); Yasudomi 
NISHIDZUKA ed., Saibokino to Taisha mappu (Cellular 
Functions and Metabolism map) , Tokyo Kagaku Dojin (1997); 
Lewin Genes VII, Oxford University Press, Oxford, UK (2000) 

20 and the like) . Further, methods for measuring such function 
of a protein are described in for example, Sambrook, J. et 
al. Molecular CloningrA Laboratory Manual, 3rd Ed. Cold 
Spring Harbor Laboratory Press, Cold Spring 

Harbor, NY, USA (2001) ; Frank T. ,et 

25 al . , Thermophiles (Archaea : A Laboratory Manual 3), Cold 
Spring Harbor Laboratory Press, Cold Spring 

Harbor,NY,USA(1995) ; KOSOGAKU HANDOBUKKU (Enzyme handbook) 
edited by Bun ji MARUO, and Nobuo TAMIYA, published by Asakura 
shoten (1982); Methods in Enzymology series. Academic 

30 Press; Kazutomo Imabori, Tamio Yamakawa ed. , Seikagaku 
Jiten (Dictionary of BIOCHEMISTRY), Third Edition, Tokyo 
Kagaku Dojin (1998) ; Yasudomi NISHIDZUKA ed., Saibokino to 
Taisha mappu (Cellular Functions and Metabolism map) , Tokyo 
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Kagaku Dojin (1997); Lengeler , J. et al. Biology of the 
Prokaryotes^ Blackwell Science^ Oxford, UK(1998); Lewin 
Genes VII, Oxford University Press, Oxford, UK(2000) and 
the like. 

5 

As such, the functions of genes comprised in the genome 
of KOD are revealed by the present invention, which are 
summarized in the following Table. Table 2 describes genes 
defined by the region (1) as described in Table 2 
10 (hereinafter. Gene ID No. (1) and the like; the amino acid 
sequence of the gene is a sequence corresponding to the SEQ 
ID NO: set forth in SEQ ID NO: as described in the table) . 



177 



KJ002 



Q 
O 

CO 
•H 

CO 

q 

CD 
ro 

-a 

O 
CO 

o 
o 
o 
o 
o 

Q) 

o 

w 

o 
2: 

ID 

Eh 

M 

Q 
C£l 
CO 
M 
DCJ 
Cli 

o 

CO 
IS 

w 

CD 

o 
o 

I— I 
H 

M 

o 

Q 
CN 

CO 

< 



.2 

t 

Q 




CD 

Oh 
I 

s 




a 

1 

I 

o 

a 

(D 



c3 



O 

Oh 
H 



^3 



On 




-5 '2^ 0 



I 

o 



£9 



o 

Pi 



8 



O 



CN 



o 
m 



ON 




o 



o 



00 



On 




CN 



O 



ON 
OS 

m 



2 



0) 

o 
o 



o 

So" 



C/5 



9 g 



00 



00 




cs 

00 
o 



in 

00 
O 



m 
00 

(N 
00 
O 
CN 



VO 
cn 

CN 

00 
o 

CN 



00 

On 

00 
O 
CN 



VO 

00 
o 

CN 



O 



CO 

§ 

CO 



CO 



O 



m 

ON 

o 

CN 



(N 

o 

CN 



ON 
ON 
CN 

m 
00 
o 

CN 



CN 

ON 

CN 

00 
o 

CN 



VO 
CN 
CN 
CN 
00 
O 
CN 



ON 

ON 

00 
O 
CN 



O O 

CO ^ ^-N 

2 8 >i£. ^ tS 



vo 
O 



m 

CO 

in 



to 
vo 



o 



ON 

m 



vo 



0 o 

1 ^ § I i 



10 



ON 

o 

vo 



vo 
00 
in 
vo 



CN 
in 



b 3 

CO CO 



C7N 



§ 9 d 

O W Z 



<N 



to 



vo 



i 



178 



KJ002 



Predicted DNA modification 
methylase 


Predicted N-acetylglucosaminyl 
transferase 


NAD-dependent aldehyde 
dehydrogenases 


Uncharacterized ACR 


GTPases 


Glycerophosphoryl diester 
phosphodiesterase 


Predicted hydrolases of HD 
superfamily 


ABC-type molybdate transport 
system 


Predicted nucleic acid-binding 
protein 


Regulators of 
stationary/sporulation gene 
expression 


ABC-type sulfate/molybdate 
transport systems 


Predicted Al'Pases 




O 








a 




Ph 






Oh 






1 


PutA 


1 




o 

a 

p 


1 


ModA 


1 


AbrB 


CysU 




o 


m 


o 
cn 




r~ 

<N 


o 
cn 


o 
in 


o 

in 
in 


VO 


in 


ON 

cn 


VO 
SO 
CN 


8726 


9572 


10299 


10787 


11414 


11646 


mis 


13391 


13833 

j 


14000 


14885 


15962 


7658 


9011 


10104 


10385 


10859 


11445 


11759 


12404 


13425 


13841 


14159 


15371 


r-3 


f-2 


f-3 


f-2 


r-3 


f-3 


f-2 


f-2 


r-2 


r-3 


f-2 


f-2 


2157 


m 




cn 


2156 


in 

CN 

r-- 


in 
cn 


VO 

cn 


1836 


2155 


cn 


oo 

cn 


2080623 


2079285 


2078999 


2078571 


2077962 


2077652 




2075967 


2075537 


2075322 


2074482 


2073414 


2081723 


2080535 


2079283 


2079002 


2078570 


2077972 


2077655 


2077040 


2075986 


2075570 


2075225 


2074139 


8755 


10093 


10379 


10807 


11416 


11726 


12286 


13411 


13841 


14056 


14896 


15964 


7655 


8843 


10095 


10376 


10808 


11406 


11723 


12338 


13392 


13808 


14153 


15239 


r- 


OO 


OS 


o 




CN 


cn 




in 


VO 




OO 



179 



KJ002 



Predicted ATPases of PP-loop 
superfamily 


ABC-type sulfate/molybdate 
transport systems 


Membrane protease subunits 


Na+-transporting 
NADH:ubiquinone 
oxidoreductase alpha subunit 


Archaea-specific RecJ-like 
exonuclease 


Polyribonucleotide 
nucleotidyltransferase 
(polynucleotide phosphorylase) 


Predicted phosphatases 


Uncharacterized ACR 


S-adenosylhomocysteine 
hydrolase 


Zn-dependent hydrolases 


Uncharacterized ACR 


RNAse P protein subunit RPR2 


Pi 




O 














oi 




* 




CysA 


HflC 


NqrA 


1 


Pnp 


Gph 


1 


SAMl 


GloB 




RPR2 


cs 


oo 


o 


OS 

cs 


m 


o 


(N 
<N 


o 

CN 

m 


OS 

vo 




oo 


ON 


16649 


17686 


18437 


19251 


19407 


20885 


21908 


22552 


24193 


24808 


25446 


25770 


16505 


16708 


17879 


18792 


19293 


20645 


21269 


21931 


22921 


23953 


24879 


25476 


f-2 




m 


r-2 


r-2 


r-3 


r-3 


r-l 


r-l 


f-1 


f-3 


r-2 


ON 

m 




2154 


1835 


1834 


2153 


2152 


1466 


1465 


vo 


vo 
CN 


1833 


2072679 


2071681 


2070585 


2070098 


2069195 


2068191 


2067459 

i 


2066809 


2065183 


2064544 


2063927 


2063567 


2073227 


2072682 


2071598 


2070592 


2070088 


2069195 


2068112 


2067465 


2066781 


2065431 


2064565 


2063965 


16699 


17697 


18793 


19280 


20183 


21187 


21919 


22569 


24195 


24834 


25451 


25811 


16151 


16696 


17780 


18786 


19290 


20183 


21266 


21913 


22597 


23947 


24813 


25413 


OS 


o 

CN 




<N 
CN 




CN 


CN 


vo 
CN 


r~ 


OO 
CN 


CN 


o 



180 



RJ002 



Predicted A TPase involved in 
replication control 


ATPase involved in DN A repair 


O 

Co 


Uncharacterized proteins of 
WD40-like repeat family 


Uncharacterized ArCR 


SAM-dependent 
methyltransferases COG0500 
SmtA 


Archaeal flagellins(flagellin) 


Archaeal flagellins(flagellin) 


Archaeal flagellins(flagellin) 


Archaeal flagellins(flagellin) 


Archaeal flagellins(flagellin) 


Putative archaeal flagellar protein 
C 


Putative archaeal flagellar protein 
D/E 


Putative archaeal flagellar protein 
G 










CO 






Z 














o 


SbcC 


UshA 




1 


SmtA 


FlaB 


FlaB 


FlaB 


FlaB 


FlaB 


FlaC 


FlaD 


FlaG 


On 
<N 


<N 


m 


CN 


ON 


o 


rs 
o 
cs 




o 

CN 


CN 
VO 
CN 


O 
C3N 
CN 


OO 


00 
<N 


<^ 


27364 


28012 


29116 


30655 


31264 


32182 


33087 


33636 


35804 


36533 


37378 


37868 


39296 


40318 


25930 


27568 


28777 


29791 


31102 


31414 


32382 


33309 


35048 


35888 


36553 


37541 


38870 


39862 






1 




<^ 




f-3 


f-3 


f-2 


CN 


f-1 


f-2 


f-2 


f-1 


1464 




1463 


oo 




o 


CN 


OO 
CN 


O 
un 
CO 


CO 




cn 


CO 

wo 

CO 


CN 


2061982 


2060758 


2060044 


2058697 


2058112 


2057143 


2056127 


2054345 


2053554 


2052837 


2051998 


2051508 


2050080 


2049046 


2063565 


2061813 


2060787 


2059596 


2058276 


2057964 


2057011 


2056087 


2054330 


2053496 


2052825 


2051984 


2051504 


2049618 


27396 


28620 


29334 


30681 


31266 


32235 


33251 


35033 


35824 


36541 


37380 


37870 


39298 


40332 


25813 


27565 


28591 


29782 


31102 


31414 


32367 


33291 


35048 


35882 


36553 


37394 


37874 


39760 




cn 


CO 

cn 




in 


m 




OO 

cn 


ON 
CO 


o 


5^ 


CN 




5 



181 



KJ002 



Predicted ATPases involved in 
biogenesis of archaeal flagella 


Predicted ATPases involved in 
pili and flagella biosynthesis 


Uncharacterized membrane 
component of archaeal flagella 


Predicted helicases 


Protein-L-isoaspartate 
carboxylmethyltransferase 


Phosphoserme phosphatase 


Phosphoserine phosphatase 


Signal peptidase 




Serine/threonine protein kinases 


Uncharacterized ACR 


STAS domain protein 


Predicted hydrolases of the HAD 
superfamily 


Phosphatidylglycerophosphate 
synthase 


Uncharacterized ArCR 


Z 






Pi 


O 


pq 


w 








c/:} 


Pi 






CO 


FlaH 


> 


FlaJ 


1 


Pcm 


SerB 


SerB 


< 




' SPSl 






1 


PgsA 


1 


OO 

m 


o 


vo 
vo 


VO 
m 


'id- 

<N 


o 

vo 


CO 


(N 
<N 




oo 

CN 


o 

CN 


OS 
CN 


VO 
CN 


OS 

cs 


vo 
CN 


41068 


42692 


44436 


46073 


46986 


47321 


47794 


49128 




49669 


50292 


50461 


51410 


52056 


52603 


40372 


41072 


42696 


45869 


46497 


47171 


47320 


47943 




49528 


49728 


50290 


50705 


51492 


52069 


f-1 


f-2 


f-3 


f-2 


f-3 


f-2 


f-1 


r-2 


r-l 


f-1 


f-3 


r-l 


f-2 


r-2 


r-l 




cn 


CN 


m 


O 

cn 


VO 
m 




1832 


1462 




m 
r- 


1461 


cn 


1831 


1460 


2048308 


2046684 


2044934 


2042943 


2042387 


2041962 


2041579 


2040239 


2040049 


2039647 


2039081 


2038819 


2037966 


2037317 


2036773 


2049018 


2048306 


2046682 


2044937 


2042908 


2042207 


2042061 


2041441 


2040225 


2039985 


2039650 


2039100 


2038685 


2037895 


2037315 


41070 


42694 


44444 


46435 


46991 


47416 


47799 


49139 


49329 


49731 


50297 


50559 


51412 


52061 


52605 


40360 


41072 


42696 


44441 


46470 


47171 


47317 


47937 


49153 


49393 


49728 


50278 


50693 


51483 


52063 


»0 






oo 


On 


o 
«o 




CN 


m 
un 






vo 




OO 


ON 



182 



KJ002 



B 

i 



Pi 
u 
< 

N 
'C 

B 

1 

u 
C 



< 

u 

N 

•c 

u 
c 



1 



o .a 



O 

1 
s 

o 

a 



g 

<t-> 

2 



O 



CO 

I 



C2 

^ o 

— . 

O) <o o 

I ^ 

Q 



•a 

c/3 c/a 



o 

-§ 

oo 
o 

5 
o 

c 

a 

^3 



X 
O 
-a 



I 

22 
o 

>< 
o 



c 



(/3 



O 



X 



o 



o^ 



o 



oo 

CN 



oo 



CN 



o 

OS 

m 



o 



o 



ON 
OS 



o 



oo 



oo 



o 

OS 
»0 



oo 

OS 



OS 











OO 






OS 




m 













o 

CN 



CN 
CN 

m 



OS 



CN 



OO 

o 



CN 
CN 



vo 
OO 

oo 



CN 
OS 



OS 







o 




OS 




OS 


vo 


cn 


CN 




vo 


VO 


vo 


VO 



CN 



CN 



CN 



CN 



CN 



O 

m 
00 



vo 



00 



CN 



On 
m 



(N 
00 



00 



CN 



to 



00 

oc 



OS 



vo 
00 

m 
o 

CN 



00 
10 

CO 

O 
CN 



CN 

m 
o 

CN 



o 

vo 
m 
m 
m 
o 

CN 



cn 
o 

CN 



O 

vo 

CN 

m 
o 

CN 



CN 

o 

CN 



OS 

m 
o 

CN 



CN 
O 



o 

CN 



vo 

ON 

(N 

o 

CN 



OS 
CN 

o 

CN 



OS 
CN 

o 

CN 



10 


VO 


VO 




00 


OS 


vo 


m 


OS 


»o 


m 




CN 


CN 




0 


0 


0 


CN 


CN 


CN 



VO 
vo 

m 
o 

CN 



OS 

o 

CN 

cn 
o 

CN 



O 

cs 
m 

m 
o 

CN 



CN 

m 
vo 
m 
m 
o 

CN 



VO 
CN 

m 
m 
o 

CN 



rn 

cn 
m 
o 

CN 



O 
CN 

o 

CN 



CN 
O 

cs 



C3S 


m 






CN 








vo 




0 


OS 


CO 




CN 


0 


0 


0 


CN 


CN 


CN 



o 
00 

ON 

CN 

o 
cs 



00 


CN 


m 






m 






m 


vo 




m 


CN 


CN 


CN 


0 


0 


0 


CN 


CN 


CN 



CN 
OS 



o 

CN 
O 



vo 

o 

vo 
10 



O 
VO 



m 
vo 

CN 
VO 



00 
o 

vo 



vo 
CN 

10 



00 



vo 


m 


00 




0 


vo 


CN 




00 


00 


C3S 


ON 









as 

OS 





CN 


CN 


CN 


C7N 


00 




OS 




m 






vo 


vo 


vo 



CN 
O 
VO 
CN 



o 

vo 



ON 
VO 



vo 



00 

o 
to 
10 



CN 
VO 



VO 

to 



vo 



CN 



vo 
to 



vo 



CN 

vo 
to 



to 
vo 



vo 

vo 
to 



vo 

vo 



VO 
CN 

to 



vo 



ON 
ON 

to 
to 



00 

vo 



to 
to 
00 
00 
to 



OS 
vo 



o 

ON 

to 



o 



00 

OS 

00 

C3S 

to 



0 


vo 


to 


m 


CN 




00 


CN 


0 


CN 


^ 


vo 


VO 


VO 


vo 



CN 



183 



KJ002 



U 

I 



.S 

CO 



2 

Ph 



I 

s 

0^ 



^3 



-I 
I 

.2 
• *^ 

> 

13 
'C 



PQ 



o 



i 

too 

I 

O 



in 

2 
2 



.S 
'S 

I. 

I 

>^ 

c/a 

I 



1 



-a 

X 



o 



C/3 

I 

Oh 



^ - 

o 
a 

"3 

C3 



i 

i 

Q 



CO 



o 

c/a 



<N 
O 

O 
O 
U 



T3 



N 

i 

O 



(1> 
o 

i 



Pi 



CO 



-4— » 



Oh 

> 

a 



0^ 



o 
O 



CO 



ON 



oo 
m 



m 



oo 



cn 
m 
oo 
oo 



ON 



ON 

O 



ON 
O 



CN 



m 



CN 
On 



so 



to 
o 
o 
oo 



o 
oo 
o 
oo 



o 

(N 
ON 



o 

ON 
ON 



oo 
oo 
o 



CN 
CN 
C3N 



ON 

o 
m 



O 
o 



so 
oo 



CN 



o 
oo 



CN 



CN 



CN 

C4iH 



CN 



CN 



O 
CN 



CN 



CN 
CN 



so 



m 



CN 

so 



CN 



m 
VO 



oo 



so 



vo 



O 
»0 



CN 



o 
o 

CN 

o 

CN 



ON 
ON 

o 

CN 



O 

o 
o 

CN 

o 

CN 



m 
oo 

ON 

O 
CN 



so 
oo 
oo 

o 

CN 



SO 

CN 
OO 

o 

CN 



oo 



o 

CN 



OO 

so 
o 

CN 



ON 

m 
o 

so 

o 

CN 



m 
o 

CN 



SO 

m 



o 

CN 



cn 

m 
m 

o 

CN 



o 

CN 

C?N 



O 
CN 



cn 
o 

o 

CN 



CN 

VO 
ON 
O 
O 
CN 



SO 
CN 

o 

CN 

o 

CN 



ON 

ON 

O 
CN 



SO 

oo 

On 

O 
CN 



OO 

oo 



o 

CN 



so 

CN 

oo 
o 

CN 



On 
ON 
SO 

o 

CN 



SO 



o 

CN 



CN 

s 

o 

CN 



o 

CN 



CN 
SO 

m 
o 

CN 



VO 

m 

CO 

o 

CN 



so 



o 

CN 



SO 

a\ 
o 
o 

CN 



ON 
OO 

so 



m 
On 
SO 



ON 

ON 
SO 



CN 



o 



CN 



C3N 
CN 



ON 

m 
m 
m 



so 



o 

so 



in 

CN 

o 



OO 



o 

ON 



so 

CN 
C7N 



ON 
ON 
CO 

VO 



so 



VO 



oo 

ON 
SO 



CN 
ON 

ON 
VO 



OO 



o 
in 
o 



C3N 



o 

OO 



ON 

so 



oo 



so 
CN 



CN 

oo 



so 






CO 


o 




m 


so 




m 






C^ 







CN 




CN 


CN 


m 


CN 


o 




VO 


VO 




ON 









OO 



oo 



oo 



so 
oo 



oo 



oo 
oo 



184 



KJ002 



Regulators of 
stationary/sporulation gene 
expression 


Acyl-CoA synthetases 
(AMP-forming)/AMP-acid 
ligases II COG0318CaiC 


Predicted ATPase involved in 
replication control 


Fe-S oxidoreductases 


Serine proteases of the peptidase 
family S9A 


Molecular chaperones (contain 
C-terminal Zn finger domain) 




Na+-dependent transporters of the 
SNF family 


Integrase 


Diphthamide biosynthesis 
methyltransferase DPH5 


Mn-dependent transcriptional 
regulator 




(y 




u 


a 


O 




Pi 








AbrB 


CaiC 


u 


GlpC 




DnaJ 






XerC 


DPH5 


TroR 


m 


cs 


CO 
CO 
CN 


CO 
CO 


00 
(N 


vo 
CN 


Os 

cs 


00 

CS 


o 
o 


00 


(N 


80058 


80402 


83075 


83602 


84109 


84420 


84731 


85176 


85847 


87128 


87619 


79968 


80318 


81101 


83440 


83947 


84303 


84530 


85002 


85448 


86345 


87226 


f-3 


f-2 


f-2 


f-l 


f-l 


CO 


f-2 


f-3 


f-2 


r-3 


f-l 


ON 
CO 


VO 
VO 
CO 


VO 
CO 


'3- 

ts 


m 
cs 


o 


OO 

VO 
cn 


5^ 
t>- 


ON 
VO 
CO 


2149 


vo 

CN 


2009249 


2008950 


2006202 


2005750 


2005111 


2004938 


2004360 


2004038 


2003430 


2002239 


2001715 


2009410 


2009132 


2008946 


2005947 


2005470 


2005114 


2004917 


2004379 


2003957 


2003045 


2002167 


80129 


80428 


83176 


83628 


84267 


84440 


85018 


85340 


85948 


87139 


87663 


79968 


80246 


80432 


83431 


83908 


84264 


84461 


84999 


85421 


86333 


87211 


OS 
00 


o 

ON 


OS 


(N 


CO 
ON 


ON 


VO 


so 

OS 


as 


oo 

o\ 


ON 



185 



KJ002 



Na+-driven multidrug efflux 
pump 


DNA polymerase III alpha 
subunit 


Predicted hydrolases of the HAD 
superfamily 


Predicted Zn-ribbon 
RNA-binding protein with a 
function in translation 


Translation elongation factor 
EF-lbeta 


Histone acetyltransferase HPA2 
and related acetyltransferases 
COG0454 WecD 


Chorismate synthase 


C 
• ^ 

2 

t 

2 


Uncharacterized 

NAD(FAD)-dependent 

dehydrogenases 


Glycine/D-amino acid oxidases 
(deaminating) 








> 




^ 06 


w 


W Pi 




w 


NorM 


Ah 






EFBl 


WecD 


AroC 


PutP 


HcaD 


DadA 


ON 
CO 


CO 


OO 


CO 


VO 


CNl 

CO 


oo 

<N 


CN 
OO 




CO 

VO 


88224 


88851 


90003 


90265 


90558 


90976 


91355 


92974 


94539 


95710 


87912 


88395 


ON 

CO 
ON 
OO f 


90088 


90285 


90811 


91268 


91363 


93072 


94567 


CO 


CO 


CO 


(«^ 


CO 


tin 


CN 




CO 




<N 


CO 




CN 


wo 


1458 


O 
CO 


oo 

fN 




tN 


2001113 


2000099 


1999319 


1999111 


1998818 


1998322 


1998012 


1996399 


1994828 


1993666 


2001715 


2001112 


2000071 


1999299 


1999102 


1998795 


1998200 


1998015 


1996306 


1994826 


88265 


89279 


90059 


90267 


90560 


91056 


91366 


92979 


94550 


95712 


87663 


88266 


89307 


90079 


90276 


90583 


91178 


91363 


93072 


94552 


o 
o 


o 


CN 
O 


CO 

o 


^ 
o 


O 


VO 
O 


o 


oo 
o 


OS 

o 



186 



KJ002 



Uncharacterized 

NAD(FAD)-dependent 

dehydrogenases 


Fe-S-cluster-containing 
hydrogenase components 2 


Glycine/D-amino acid oxidases 
(deaminating) 




Uncharacterized ACR 


Histone acetyltransferase HPA2 
and related acetyltransferases 
COG0454 WecD 


Predicted transcription factor 




Uncharacterized ArCR 


Periplasmic serine proteases 
(ClpP class) COG0616 SppA 


Uncharacterized ACR 


Transcription initiation factor 
TFIID (TATA-binding protein) 


RecA-superfamily ATPases 
implicated in signal transduction 


Pi 


u 








^ Pi 








:z o 






H 


HcaD 


HycB 


DadA 






WecD 






1 


SppA 


1 


SPT15 


to 

^ to 


<N 
O 


oo 

CN 




OO 

as 
cn 


o 
m 


<N 
fN 


vo 
O 
CN 




CN 
OO 


o 


m 
CN 


vo 
m 


o 
m 


97601 


98127 


99581 


100881 


101098 


101695 


102315 




103364 


104313 


106099 


106759 


107104 


96185 


97629 


98474 


99654 


100975 


101239 


101805 




103016 


103539 

1 


104398 


106210 


106894 


f-2 


f-3 


f-2 


f-3 


r-1 


r-1 


f-3 


f-3 


r-3 


f-3 


f-1 




f-1 


m 




r- 


OO 


1 

1457 


1456 


as 


O 

to 


2148 


to 


O 


CO 


CN 
m 


1991742 


1991231 


1989795 


1988486 


1988173 


1987645 


1987031 


1986815 


1985946 


1985060 


1983277 


1982599 


1981924 


1993193 


1991758 


1990961 


1989730 


1988463 


1988154 


1987582 


1986985 


1986392 


1985902 


1984980 


1983168 


1982544 


97636 


98147 


99583 


100892 


101205 


101733 


102347 


102563 


103432 


104318 


106101 


106779 


107454 


96185 


97620 


98417 


99648 


100915 


101224 


101796 


102393 


102986 


103476 


104398 


106210 


106834 


o 


111 


CM 






to 






oo 




O 
CN 


CN 


CN 
CN 



187 



KJ002 



I 

13 



u 
Q 



< 



oo 
o 



o 
o 



On 
O 
OO 
ON 



oo 

OS 



OO 

o 



o 



CN 



.15 T3 

O § 



> 



1. 1 



.s 

X 

o 



o 
cd 
O 
X 

o 

I 

CN 



C/3 
CD 



O 



O ^ 



o 
o 



o 



ON 

o 

ON 

o 



ON 

oo 
o 



CN 



On 
CN 

o 
oo 



ON 

oo 
o 
oo 

ON 



ON 
ON 

o 

ON 

o 



CN 
OO 

oo 
o 



CN 



J 

13 



.a T3 
O § 

S 



o 
-a 



I 



o 



III 



o 



(L> 



o 

CN 



O 

o 



CN 
<5N 
O 
ON 

o 



CN 



CN 
00 



oo 

ON 



OO 
CN 
O 
OO 
ON 



o 



CN 
ON 
O 
C3N 
O 



CN 



O 

o 

x 
o 



o 

X 

o 

I 

CN 



O 

*x 
o 



X 

o 



Oh 
Q 



1 

O 
I 

o 



c2 



CN 



o 



(N 



ON 

m 

ON 



ON 



o 
m 



CN 



CN 



to 
to 



to 
o 

CN 
cn 



cn 



cn 

lO 



oo 

C?N 



cn 

ON 



cn 

lO 



»o 

O 
CN 

cn 



CN 



8 

G 



PL, 



O 



o 

ON 

cn 



cn 
to 



cn 
cn 



<N 

o 

ON 



''Si- 
ON 



to 
cn 
to 



oo 
vo 



oo 

CN 



G 

•a 
B 

o 
.3 

G 

1 

o 
> 

1 



1 

oo 

CN 



OO 

cn 



o 
to 



(N 

cn 



ON 
CN 

ON 



OO 

ON 

cn 

ON 



o 



ON 

cn 
to 



ON 
CN 



00 


G 


G 


O 


indi 


•ibb 










o 

1 


cd 






G 




•a 








G 




O 

o 




G 












1 


PL, 





CN 



ON 

»o 

vo 



<N 

to 

VO 



to 



ON 



VO 
ON 

oo 

CN 



cn 
vo 
vo 



CN 
OO 



o 
cn 



18 8 KJ002 



ATPases involved in DNA repair 


Predicted membrane protein 


Phosphate/sulphate permeases 


Spermidine synthase 


Hydrolases of the alpha^eta 
superfamily 


Uncharacterized ACR 




Glycine cleavage system H 
protein (lipoate-binding) 


Putative stress-responsive 
transcriptional regulator 
COG1983 PspC 




Triphosphoribosyl-dephospho-Co 
A synthetase 


Archaea-specific RecJ-like 
exonuclease 


Archaea-specific RecJ-like 
exonuclease 


Methyl-accepting chemotaxis 


hJ 




{X, 


w 








w 


KT 












RecN 




PitA 


SpeE 








GcvH 


PspC 




CitG 






Tar 




cn 


o 
cn 




cn 


ON 
VO 

cs 






o 




cn 
cn 


Os 
<N 


1300 


oo 


117054 


117835 


118379 


119931 


120420 


120947 




121854 


122256 




123508 


123710 


126146 


128553 


116700 


117556 


118235 


119100 


120156 


120479 




121443 


122007 




122680 


123599 


123932 


126333 


r-2 


r-1 


r-3 


r-2 


f-3 


r-3 


r-3 


f-3 


f-3 




f-1 


r-3 


r-3 


f-3 


1826 


1453 


2146 


1825 


r- 


2145 


2144 


un 


VO 


cn 
cn 


cn 


2143 


2142 




1971884 


1971136 


1970667 


1969439 


1968893 


1968426 


1968186 


1967522 


1966940 


1966711 


1965784 


1965510 


1963221 


1960817 


1972702 


1971903 


1971200 


1970317 


1969405 


1968899 


1968257 


1967974 


1967371 


1966947 


1966710 

j 


1965800 


1965446 


1963072 


117494 


118242 


118711 


119939 


120485 


120952 


121192 


121856 


122438 


122667 


123594 


123868 


126157 


128561 


116676 


117475 


118178 


119061 


119973 


120479 


121121 


121404 


122007 


122431 


122668 


123578 


123932 


126306 


cn 


cn 


cn 
cn 


CO 


yr\ 
cn 


VO 

cn 


cn 


oo 

CO 


ON 

cn 


O 


5 


(N 


cn 





189 



KJ002 



protein 


Permeases 


ABC-type 

sugar/spermidine/putrescine/iron/t 
hiamine transport systems 


ABC-type thiamine transport 
system 


L-aminopeptidase/D-esterase 
COG3191 DmpA 


ABC-type multidrug transport 
system 


Putative hemagglutinin/hemolysin 


Permeases of the major facilitator 
superfamily 


Uncharacterized BCR 


Methionine aminopeptidase 


Fe-S oxidoreductases family 2 


Transcriptional regulators 


Predicted transcriptional 
regulators 




Pi 


O 


X 


EQ 






Pi 




> 


u 










MalK 


ThiP 


DmpA 


CcmA 


FhaB 






Map 




Lrp 






oo 

fS 

'O 


tN 
On 
<^ 


oo 
>n 


ON 
CO 


CS 

oo 


O 
CO 


CO 
CO 


CN 
CO 


oo 


'id- 
ol 








130011 


131110 


133029 


133831 


134527 


134763 


135215 


138005 


138671 


140970 


141294 


141797 




128640 


130150 


131409 


132856 


133900 


134589 


135020 


137828 


138590 


139365 


141087 


141335 




r-2 


7. 


r-2 


f-1 


r-1 


r-2 


cn 


r-3 


r-3 


f-3 


f-3 


f-2 




1824 


1452 


1823 


UO 
CO 


1451 


1822 


2141 


2140 


2139 


OO 
WO 


ON 
WO 


WO 
CO 




1959365 


1958224 


1956329 


1955488 


1954831 


1954544 


1953624 


1951206 


1950702 


1948406 


1948067 


1947522 




1960747 


1959228 


1958230 


1956633 


1955493 


1954834 


1954400 


1951901 


1950857 


1950013 


1948300 


1948043 




130013 


131154 


133049 


133890 


134547 


134834 


135754 


138172 


138676 


140972 


141311 


141856 




128631 


130150 


131148 


132745 


133885 


134544 


134978 


137477 


138521 


139365 


141078 


141335 






VO 




OO 


ON 


O 
WO 


WO 


(N 


CO 
w-i 


IT) 


WO 
WO 


VO 
wo 



190 



KJ002 



Endonuclease IV 


A'l'Pase involved in DNA repair 


Predicted membrane protein 


Uncharacterized ACR 


Uncharacterized ACR 


Uncharacterized ACR 


Superfamily I DNA and RNA 
helicases and helicase subunits 


Predicted nucleic-acid-binding 
protein containing a Zn-ribbon 


Acetyl-CoA acetyltransferases 


3-hydroxy-3-methylglutaryl Co A 
synthase 


Uncharacterized ACR 


Uncharacterized ACR 


Fibrillarin-like rRNA methylase 


Protein implicated in ribosomal 
biogenesis 


Translation initiation factor 
eIF-2B delta subunit 
















Pi 










1— > 




>-> 


Nfo 


SbcC 




1 


f 








PaaJ 


PksG 


1 


1 


NOPl 


SIKl 


GCD2 




o 






m 
oo 
m 


vo 
CN 


cn 
o\ 


o 
m 

tN 


CN 
VO 


(N 

oo 
to 


>o 


vo 


vo 


m 
to 
to 


OO 

to 


142702 


143602 


144896 


145224 


145949 


146553 


149253 


149695 


150872 


151926 


152433 


152738 


153485 


154609 


155879 


141862 


142903 


143765 


144936 


145334 


146016 


147309 


149293 


149708 


150876 


152076 


152417 


152810 


153487 


154919 


r-l 


r-l 


r-3 


r-2 


f-2 


r-2 


r-2 


r-l 


r-3 


r-2 




f-2 


r-3 


r-l 


r-3 


1450 


1449 


2138 


1821 


VO 

m 


1820 


1819 


1448 


2137 


1818 


O 
VO 


r-. 
m 


2136 


1447 


2135 


1946671 


1945585 


mmi 


1944143 


1943427 


1942775 


1940105 


1939681 


1938504 


1937450 


1936907 


1936635 


1935888 


1934626 


1933497 


1947525 


1946646 


1945622 


1944454 


1944044 


1943371 


1942171 


1940085 


1939679 


1938502 


1 1937302 


1936961 


1936577 


1935891 


1934534 


142707 


143793 


144931 


145235 


145951 


146603 


149273 


149697 


150874 


151928 


1 152471 


152743 


153490 


154752 


155881 


141853 


142732 


143756 


144924 


145334 


146007 


147207 


149293 


149699 


150876 


152076 


152417 


152801 


153487 


154844 


to 


OO 

to 


as 
to 


o 


vo 


CN 


cn 
vo 




to 
VO 


vo 


vo 


oo 

vo 


Ov 
vo 


o 





191 



KJ002 




192 



KJ002 



CD 



O 



I 



Oh 



O 

> 



1 1 



oo 
oo 
m 



oo 

ON 
to 

VO 



oo 



o 
oo 

ON 



m 

ON 

m 
vo 



oo 

OS 
m 



OO 



o 



o 



o 

ON 

m 
vo 



CM 



oo 



oo 

ON 



OO 

oo 

ON 
OS 



m 
vo 



o 

ON 

m 
vo 



oo 



o 

g 

o 

-a 

CO 

O 
J3 



Oh 



VO 



VO 
OO 

as 
vo 



ON 

O 
ON 
VO 



C/3 



O 

o 

ON 
ON 



ON 

o 

ON 



o 



OO 
OO 
OO 

vo 



oo 



Oh 



to 

OO 

m 



vo 



to 
o 



CN 



vo 
OO 



o 

lO 
CN 
OO 

ON 



ON 
OO 

ON 



OO 



»o 
o 



lO 

oo 



O 




Oh 



vo 



ON 



ON 
ON 



oo 

(N 
OO 

r-H 

ON 



OO 



O 



vo 
OO 



vo 



m 
»o 



On 
m 



o 



OO 
vo 

ON 



to 

ON 
ON 

On 



m 
to 

CN 



oo 
m 



oo 



CN 

o 
vo 



ON 
CN 
OO 

m 



ON 
ro 
to 

CN 



CN 



^o 
oo 



»o 
»o 

C7N 



to 
oo 
vo 

On 



oo 



CN 

to 

CN 



oo 
oo 



CN 



CO 

CO 

*o 

ON 



CN 
OO 

to 

OS 



to 
oo 

ON 

m 



vo 

ON 

oo 



ON 
OO 



Oh 



ON 
CN 



C3N 
ON 

to 



CN 



ON 



C3N 
ON 



O 
VO 



o 



o 

ON 



C/3 
Oh 

< 



I 



to 



vo 

oo 



ON 

to 



to 
vo 



C3N 
CN 
O 

ON 



ON 
ON 



On 
cn 

40 



»o 
oo 
»o 



ON 



a 
.2 

o 

-a 

CO 



■a 



I 



193 



KJ002 




S P. 



t 



X 

o 

O 



J 




.1^ 

'S 

1 

• ^ 

CD 



U 

N 

i 



o 



o 



CO 



too 



CO 

O 
JO 

X 

o 



PL, 



Pi5 



1 

Oh 



CO 



o 



I 

PL, 



OO 



OO 



O 



OO 

ON 



O 



o 



OO 



m 
m 

OO 



OO 

o 



o 

OO 



CM 

o 
o 

OO 



CN 
OO 



(N 

m 
m 

OO 



m 
r- 

OO 



m 
o 

OO 



ON 



OO 



m 
m 

OO 



ON 



ON 



o 

OO 



OO 

CN 

o 



ON 
CN 



OO 



CO 
OO 



CN 



CN 



CN 



CN 



OO 



o 

OO 



m 

OO 



OO 



OO 



ON 

cs 



CNJ 



o 



C3N 
CO 



o 
m 

CN 

ON 



CN 



o 
m 
o 



ON 



C3N 

m 
o 

ON 



CN 
OO 
OO 

o 

ON 



cn 

OO 

o 

C3N 



O 
ON 
O 
OO 
O 
ON 



o 

O 
ON 



CO 

o 

ON 



o 

C3N 



OO 

cn 
VO 
cn 

ON 



o 

CN 
CN 

ON 



3t; 



ON 



OO 

o 



ON 



OO 

o 
5\ 



OO 
OO 

o 

C7^ 



o 
to 
cn 

OO 

o 

ON 



o 

OO 

o 

ON 



CN 
C3N 

uo 
O 
ON 



VO 
O 

ON 



OO 

m 
o 



OO 



OO 
OO 



ON 

m 
o 

ON 



uo 
o 

OO 



o 



OO 



OO 
OO 
CN 

OO 



CN 

m 
m 

OO 



ON 
OO 



uo 

C?N 

uo 
OO 



o 



OO 



OO 



OO 



o 
cs 
cn 

OO 



ON 



ON 



m 

in 
o 

OO 



OO 
CN 

o 

OO 



to 
cn 

OO 



VO 
m 

m 

OO 



CN 
VO 

cn 
to 

OO 



CN 



m 

ON 



ON 



to 

ON 



VO 

ON 



OO 

C3N 



ON 
ON 



O 
O 



o 

CN 



194 



KJ002 



a 



CO 

t 

CO 

O 

.s 

X! 
O 
-a 

•a 

Oh 



o 

1 



o 
o 



o 

co 
O 

I 




Q 

a 



N 

1 

D 




.S 

I 
1 

CO 



*3 

Ph 

< 



Oh 



CO 

i 



CO 
CO 



Q 
o 

o 

o 

O 



CO 



CO 



^ Pi 



Q 

O 
O 



O 



Q 
o 



2 



o 



oo 
oo 



OO 

CN 



VD 
OS 
VO 
OO 



ON 

oo 



CN 

oo 



VO 
CN 

o 

ON 



OO 

o 



oo 

ON 



ON 
OO 

CN 
ON 



m 


ON 


»0 


VO 




CN 




CO 


O 


CN 


m 




ON 


On 


ON 









o 

ON 



On 
oo 



o 

CN 



oo 



m 
oo 
o 
oo 
oo 



vo 
oo 

ON 

oo 



cn 
*^ 

CN 
O 
C3N 



O 
VO 

o 

ON 



ON 
OO 

oo 

ON 



m 




o 




O 




to 


o 




CN 


m 


cn 


ON 


ON 


ON 









CN 
CN 
C3N 



CN 



CN 



00 

cn 



VO 



CN 

00 



VO 



00 



00 



o 
00 



VO 



CO 

CN 

o 

ON 



CN 
O 



VO 

o 
o 
o 

ON 



o 
o 

ON 
ON 

00 



00 

ON 
00 



C3N 

ON 
00 



ON 
VO 

00 

VO 
ON 
00 





CN 


«0 


ON 


ON 




m 


00 


CO 


VO 


10 




ON 




ON 


00 


00 


00 









o 
o 

ON 

00 



o 

ON 
O 

ON 



CN 
CN 
O 

ON 



o 
m 

o 

ON 



ON 
ON 
00 



in 

CN 

ON 
ON 
00 



00 

00 
00 



o 

ON 
00 











0 




00 




VO 


VO 


VO 




ON 


C5N 


<5N 


00 


00 


00 









VO 
CN 
CN 

ON 
00 



o 
o 

00 



in 
OS 

00 



m 

ON 
00 



00 
CN 

o 

ON 



CN 
VO 

o 

ON 



ON 

ON 
ON 



ON 
O 
10 
CN 
ON 





VO 


cn 


00 


00 


m 


ON 




0 


CN 






ON 


ON 


ON 









00 

lO 
ON 



00 
00 
ON 
lO 

00 



o 

CN 



00 



m 
o 

CN 



o 
00 
00 



o 

CN 



VO 
00 

ON 
00 



lO 

o 

CN 



CN 

o 

ON 



VO 
O 
CN 



O 

m 

VO 
O 



o 

CN 



00 

ON 



00 

o 

CN 



lO 










0 




ON 




CN 


CN 


ro 


C7N 


ON 


ON 









On 
O 
CN 



O 
CN 



CN 



CN 



C?N 



CN 

r3 



195 



KJ002 



o 



t3 

2 
3 

a 



CO 

CO 



.3 >i 



O 



I 

13 



o 
O 
O 



CO 
CO 

I 



S 



O 
O 
tr> 
O 

O 

o 
a 

CO 



I 
t 



I 

c/3 



CO 

p 




O 

o 



.2 

a 
6 



2 



Oh 



CO 

m 

ON 



ON 

o 

ON 



OS 
m 



ON 

cn 

ON 
OO 



oo 

CNI 
OO 



o 
in 

ON 



ON 

o 

ON 



m 



»n 
m 

ON 
OO 



m 
vo 
m 

ON 
OO 



NO 
OO 
ON 



ON 



OO 



CM 

m 

ON 
OO 



CO 
OO 

<^ 
m 
On 
OO 



NO 
ON 



ON 
ON 

ON 



CM 



Q 
u 



ON 



ON 
NO 
ON 



OO 

m 



NO 
ON 



ON 

CN 
Ov 
OO 



o 

CN 

m 

ON 
OO 



ON 
ON 

vo 
ON 



OO 



ON 



vo 

CN 



CN 



m 
vo 
tn 

ON 



o 

ON 



ON 
OO 



vo 

CN 
ON 
OO 



CN 
VO 

ON 



CN 
CO 

o 

OS 



CN 



I 



vo 



OO 



OO 
ON 



m 

OO 

as 



OO 



ON 
OO 



VO 



ON 
OO 



vo 
m 

OO 

<7v 



OS 



OO 



vo 

ON 
ON 
OO 
ON 



ON 

OO 
Ov 



CN 
ON 

o 

OO 



CN 
VO 

ON 
OO 
OO 



m 

OO 
OO 

o 

C?N 
OO 



ON 
ON 



On 

OO 

ON 



ON 

CN 



m 
vo 
m 
o 
o 

CN 



O 
ON 
ON 



ro 



OO 
CN 



CN 



CN 
C3N 
VO 
OO 
OO 
OO 



o 
vo 

ON 
OO 
OO 



vo 
OO 
vo 

o 
o 

CN 



OO 



ON 
ON 



o 

CN 
CN 



PQ 



CN 



m 
o 
o 

o 

CN 



m 

C3N 
O 

o 

CN 



«4H 



OO 

vo 



o 

OO 
CN 
OO 
OO 
OO 



vo 
m 
vo 
OO 
OO 
OO 



OO 

ON 

o 



CN 



CN 

o 
o 

CN 



CN 
CN 



o 
vo 
m 



CN 

o 

CN 



VO 

o 



o 

CN 



o 



o 

vo 

OO 
OO 



CO 
OO 
OO 
OO 



OO 

cn 
o 

CN 



vo 

o 
o 

CN 



CN 
CN 
CN 



OO 



o 
o 

CN 

o 

CN 



ro 

o 

CN 



CN 

VO 
OO 
CO 



vo 

CN 

OO 
OO 



vo 

OO 
vo 

OO 
OO 



CN 

o 

CN 

o 

CN 



CN 
O 

VO 

o 

(N 



CO 
CN 
CN 



PQ 

(U 

P. 

CO 

C3N 
CN 
CN 



CN 
CN 
ON 
CN 
O 
CN 



CO 
CN 
O 
CN 



CN 



OO 
CO 



vo 
OO 
OO 



CN 

OO 
OO 



CM 
C3N 
CN 
O 
CN 



CO 

o 

CN 
O 



CN 
CN 



196 



KJ002 



phosphatase 


Histidyl-tRNA synthetase 


ATP phosphoribosyltransferase 
(histidine biosynthesis) 


Histidinol dehydrogenase 


Imidazoleglycerol-phosphate 
dehydratase 


Glutamine amidotransferase 


Phosphoribosylformimino-5-amin 
oimidazole carboxamide 
ribonucleotide (ProFAR) 
isomerase 


Imidazoleglycerol-phosphate 
synthase 


Phosphoribosyl-AMP 
cyclohydrolase 


Histidinol-phosphate 

aminotransferaseA'yrosine 

aminotransferase 


Predicted phosphatases 
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Indole-3 -glycerol phosphate 
synthase 


Anthranilate 

phosphoribosyltransferase 


Anthranilate/para-aminobenzoate 
synthases component I COG0147 
TrpE 


Anthranilate/para-aminobenzoate 
synthases component II 
COG0512PabA 


Phosphoribosyl anthranilate 
isomerase 


Tryptophan synthase beta chain 


Tryptophan synthase alpha chain 


Prephenate dehydrogenase 


PLP-dependent aminotransferases 


Chorismate mutase 


Chorismate synthase 


5-enolpyruvylshikimate-3-phosph 
ate synthase 


Archaeal shikimate kinase 
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COG1685 - 


Shikimate 5-dehydrogenase 


Shikimate 5-dehydrogenase 


3-dehydroquinate dehydratase 


3-dehydroquinate synthetase 


3 -Deoxy-D-arabino-heptulosonate 
7-phosphate (DAHP) synthase 


Transketolase 


Transketolase 


AraC-type DNA-binding 
domain-containing proteins 


Pyrroline-5-carboxylate reductase 


Pyrroline-5-carboxylate reductase 


Acetylomithine 

deacetylase/Succinyl-diaminopim 
elate desuccinylase and related 
deacylases 
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Acetylomithine 

deacetylase/Succinyl-diaminopim 
elate desuccinylase and related 
deacylases 


PLP-dependent aminotransferases 


Acetylglutamate kinase 


Acetylglutamate semialdehyde 
dehydrogenase 


Glutathione synthase/Ribosomal 
protein S6 modification enzyme 
(glutaminyl transferase) 
COG0189 RimK 


Uncharacterized 
paraquat-inducible protein A 


Isocitrate/isopropylmalate 
dehydrogenase 


3-isopropylmalate dehydratase 
small subunit 


3-isopropylmalate dehydratase 
large subunit 


Isopropylmalate/homocitrate/citra 
malate synthases 
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Nitroreductase 


Predicted ATPase of the AAA 
superfamily 


Predicted ATPase of the AAA 
superfamily 


Metal-dependent hydrolases of 
the beta-lactamase superfamily I 


Pyruvate-formate lyase 


Ankyrin repeat proteins 


Uncharacterized ACR 


Uncharacterized ArCR 




Tellurite resistance protein and 
related permeases 


Quinolinate synthase 


Aspartate oxidase 
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Uncharacterized 

NAD(FAD)-dependent 

dehydrogenases 


Uridvlate kinase 
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G'l Pases - translation elongation 
factors COG0050 TufB 


Translation elongation and release 
factors (GTPases) 


Translation elongation and release 
factors (GTPases) 




HD superfamily 
phosphohydrolases 


Acetylomithine 

deacetylase/Succinyl-diaminopim 
elate desuccinylase and related 
deacylases 
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Putative periplasmic protein 
kinase ArgK and related GTPases 
of G3E family 


Lactoylglutathione lyase and 
related lyases 


Histone acetyltransferase HPA2 
and related acetyltransferases 
COG0454 WecD 


Allophanate hydrolase subunit 2 


Allophanate hydrolase subunit 1 


Predicted nucleic acid-binding 
protein 


Uncharacterized proteins 


Predicted nucleic acid-binding 
protein 


Predicted nucleotide kinase 
(related to CMP and AMP 
kinases) 


DNA-directed RNA polymerase 
sigma subunits 
(sigma70/sigma32) 
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Uncharacterized ACR 


Predicted nucleotidyltransferases 




Predicted nucleic acid-binding 
protein 


3-Methyladenine DNA 
glycosylase 




Membrane-bound serine protease 
(ClpP class) COG1030- 


Membrane protease subunits 




ATPases involved in chromosome 
partitioning 




Preprotein translocase subunit 
SecA (ATPase 


Thymidine phosphorylase 


Excinuclease ATPase subunit 


Molybdenum cofactor 
biosynthesis enzyme 
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ABC-type multidrug transport 
system 


Predicted membrane protein 


ABC-type transport system 
involved in multi-copper enzyme 
maturation 


Uncharacterized ACR 


Uncharacterized ACR 


tRNAandrRNA 
cytosine-C5-methylases 


Uncharacterized ACR 


Ketopantoate 
hydroxymethyltransferase 


Glycosyltransferases involved in 
cell wall biogenesis 


Sensory transduction histidine 
kinases 


Thiamine biosynthesis ATP 
pyrophosphatase 


Predicted membrane protein 


Thiamine biosynthesis ATP 
pyrophosphatase 
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ABC-type thiamine transport 
system 


Nitroreductase 


Phosphate uptake regulator 




ATPases of the PilT family 


Maleate cis-trans isomerase 


AraC-type DNA-binding 
domain-containing proteins 


Zn-dependent hydrolases 


Catalase 


Predicted transcriptional 
regulators 


Uncharacterized ACR 


Uncharacterized ACR 




Integrase 




Predicted nucleic acid-binding 
protein 
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Outer membrane receptor proteins 


Uncharacterized CBS 
domain-containing proteins 


Glycosyl transferases 


MutS-like ATPases involved in 
mismatch repair 


Replication factor A large subunit 
and related ssDNA-binding 
proteins 




Regulators of 

, stationary/sporulation gene 
expression 


' Transcriptional regulators 


Uncharacterized BCR 


CTP synthase (UTP-ammonia 
lyase) 




Membrane proteins related to 
metalloendopeptidases 
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TPR-repeat-containing proteins 


Replication factor A large subunit 
and related ssDNA-binding 
proteins 




Uncharacterized ACR 


Serine proteases of the peptidase 
family S9A 


Pyrimidine deaminase 


Riboflavin synthase alpha chain 


Predicted membrane-associated 


Transcriptional regulator 


G 1 P cyclohydrolase II 


Riboflavin synthase beta-chain 


1 Uncharacterized ArCR 


ATP-utilizing enzymes of 

Al P-grasp superfamily (probably 

carboligases) 


Phosphoribosylaminoimidazolesu 
ccinocarboxamide (SAICAR) 
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synthase 


Thiamine biosynthesis protein 
ThiC 


Flavoproteins 


Hydroxymethylpyrimidine/phosp 
homethylpyrimidine kinase 


Uncharacterized ABC-type 
transporter 


Predicted metal-dependent 
membrane protease 


Hydrogenase maturation factor 


FKBP-type peptidyl-prolyl 
cis-trans isomerases 2 


Acyl-CoA dehydrogenases 


Cysteinyl-tRNA synthetase 




Uncharacterized ACR 


Predicted transposase 


Uncharacterized ACR 
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Predicted transcriptional regulator 


Methyl-accepting chemotaxis 
protein 


Serine/threonine protein kinases 


Uncharacterized ACR 


Histidyl-tRNA synthetase 


Uncharacterized ACR 


Regulators of 
stationary/sporulation gene 
expression 


RecB family exonuclease 


Predicted ATPase of the AAA 
superfamily 


Uncharacterized ACR 


Uncharacterized proteins of 
WD40-like repeat family 


Uncharacterized ACR 


Acyl-CoA synthetase (NDP 
forming) 
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Nucleoside-diphosphate-sugar 
epimerases COG0451 WcaG 


Methyl-accepting chemotaxis 
protein 


Predicted A TP-dependent serine 
protease 


Multidrug resistance efflux pump 


Reverse gyrase 


Predicted transcriptional 
regulators 


Predicted transcriptional regulator 


Uncharacterized Zn-fmger 
containing protein 


Arginase/agmatinase/formimiono 
glutamate hydrolase 


3-hexulose-6-phosphate synthase 
and related proteins 


Cell division G TPase 


Predicted hydrolases of the HAD 
superfamily 




2 


O 


(y 


^ 






Pi 


w 


O 


Q 




WcaG 


Tar 


Sms 


EmrK 






1 




SpeB 


SgbH 


FtsZ 




o 






o 
m 


1348 


ON 
O 


OO 

m 




vo 
in 

CM 


OO 

»n 


CN 
VO 
CN 


oo 

CM 


394688 


397378 


398352 


398904 


401933 


405282 


405554 


405955 


406707 


407465 


408796 


409448 


394415 


396901 


398202 


398772 


399050 


404487 


405422 


405640 


405975 


406835 


408082 


408818 


cs 




cs 


m 


CM 


CO 


CO 

a. 


7, 


cs 


CN 
<A 




CN 

<^ 


r~ 


vo 


1779 


cs 
o 
oo 


oo 

cs 


m 
o 
oo 


2090 


1397 


1778 


On 
CN 




O 
cn 


1694628 


1691269 


1690907 


1690367 


1685193 


1684088 


1 1683747 


1683415 


1682669 


1681323 


1680571 


1679916 


1695929 


1694484 


1691200 


1690876 


1690328 


1684894 


1683959 


1683750 


1683418 


1682543 


1681326 


1680569 


394750 


398109 


398471 


399011 


404185 


405290 


405631 


405963 


406709 


408055 


408807 


409462 


393449 


394894 


398178 


398502 


399050 


404484 


405419 


405628 


405960 


406835 


408052 


408809 


m 




to 
m 


VO 
CO 




00 


ON 

cn 


o 




CM 


cn 





213 



KJ002 



Nucleoside-diphosphate-sugar 
epimerases COG0451 WcaG 
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related proteins 
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WD40-like repeat family 


Aspartyl/asparaginyl-tRNA 












hJ 


O 




w 


w 


GO 




WcaG 


1 




NusA 




GyrA 


SurA 


ArgS 


PutA 


o 




AsnS 


o 
m 


cn 


in 
o 

<N 


CN 
CN 


VO 
m 


ON 

cn 


o 
cn 


<^ 


r~ 
cs 


o 
cn 




ILL 


409645 


410307 


411027 


411686 


413045 


413754 


414175 


415123 


417259 


417462 


419017 


420561 


409495 


409902 


410499 


411176 


412490 


413523 


413938 


414877 


417115 


417330 


418663 


419247 


f-1 


f-3 


m 
^ 


f-2 


f-2 


f-3 


f-1 


f-1 


f-1 


f-3 


f-1 


f-3 


oo 


O 
OO 


in 
o 
oo 


m 


m 


VO 

o 
oo 


ON 


O 
OO 


00 


o 
oo 


CN 

oo 


OO 

o 
oo 


1679731 


1678919 


1678298 


1677690 


1676085 


1675463 


1675126 


1674169 


1672108 


1671449 


1670203 


1668815 


1679919 


1679731 


1678918 


1678202 


1677500 


1675963 


1675452 


1674501 


1672269 


1672087 


1670742 


1670131 


409647 


410459 


411080 


411688 


413293 


413915 


414252 


415209 


417270 


417929 


419175 


420563 


409459 


409647 


410460 


411176 


411878 


413415 


413926 


414877 


417109 


417291 


418636 


419247 


in 






oo 


ON 


o 
in 


in 


(N 


m 
in 


in 


in 


VO 



214 



KJ002 



I I 



S 
ex 




»0 
m 
vo 



0^ 
O 



vo 
vo 



oo 

vo 



CN 
CN 



<N 

o 

CN 



CN 

m 
m 



OS 

VO 
vo 

vo 
vo 



o 

vo 

vo 



ON 



CN 
CN 



m 
m 

CN 
CN 



oo 



o 

CN 



VO 

oo 

CN 
CN 



oo 
o 

CN 



oo 

m 

vo 
vo 



CN 

o 
in 

vo 
VO 
vo 



o 
o 

CN 



VO 

oo 

CN 
CN 



OV 



m 
oo 



vo 
vo 
CN 
VO 
vo 



oo 

CN 
VO 
VO 



VO 
CN 



o 

vo 



I 
t 

B 



CN 



CN 



o 
vo 

CN 



o 
oo 



vo 
m 
vo 

vo 



m 
vo 

<N 
VO 
VO 



CN 
CN 







ON 






ON 








vo 


vo 




CN 


CN 


CN 









vo 



a 

Oh 

I 

Oh 

(D 

-a 



C/3 

6 



C/3 



s 



CN 
CN 



O 
OS 
CN 



o 

CN 
OO 

CN 



CN 



cn 
o 

vo 
VO 



ON 
VO 



vo 
o 
a\ 



CN 
VO 



CN 
VO 



oo 
oo 
cn 
o 
m 



vo 
o 

ON 
CN 



oo 
oo 
o 

CN 



oo 
oo 

ON 

oo 
vo 



cn 

m 
o 

vo 
VO 



O 

C7N 

m 
o 



vo 
o 

ON 
CN 



vo 



o 

o 
m 



CN 
ON 

o 
m 



O 
ON 

o 
m 



oo 
o 

CN 



in 
oo 

iO 

vo 



oo 

ON 

oo 

iO 
vo 



vo 

o 
m 



ON 

m 
o 



vo 



< 
CN 



O 
CN 

o 
m 



vo 
o 



vo 

C3N 



m 
o> 

oo 

iO 

vo 



o 
vo 

oo 

vo 



iO 

oo 
o 



vo 

o 

m 



vo 



C/3 

a 

a, 
-a 

(D 
G 

Oh 
-a 

N 



13 



o 
oo 



CN 
CN 



m 
oo 
oo 
o 
m 



vo 
oo 
o 

CN 



ON 



vo 



wo 

ON 
OO 
vo 



uo 
cs 

CO 



CO 

oo 
oo 
o 
m 



vo 
vo 



c/5 
OO 

s 

o 

i 



n3 



Pi 
O 
< 

N 

'C 

i 



CO 



vo 



CO 

CN 
CO 



ON 
CO 
CN 

m 



oo 



o 

vo 
VO 

vo 



OO 

ON 
VO 

vo 



OO 
CO 

r- 

CN 
CO 



On 

CO 

CN 
CO 



vo 



215 



KJ002 



Mg2+ and Co2+ transporters 


Predicted GTPase 


Adenylate cyclase 


Transcriptional regulators 


Uncharacterized ACR 


Pyruvate kinase 


Predicted Zn-dependent proteases 


Uncharacterized ArCR 


Integral membrane protein 
possibly involved in chromosome 
condensation 


Uncharacterized ACR 


2-Phosphoglycerate kinase 


Phosphopantothenoylcysteine 
synthetase/decarboxylase 


Cation transport ATPases 


Uncharacterized ACR 


Distmct helicase family with a 
unique C-terminal domain 
i including a metal-binding 
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cysteine cluster 


Predicted hydrolase of the 
alpha^eta superfamily 


Rubrerythrin 


Rubredoxin 


Desulfoferrodoxin 


ATPases involved in chromosome 
partitioning 


Glycine 

hydroxymethyltransferase 


Large extracellular alpha-helical 
protein 


Cellobiose phosphorylase 


DNA-directed RNA polymerase 
subunit M/Transcription 
elongation factor TFIIS 


Histone acetyltransferase HPA2 
and related acetyltransferases 
COG0454 WecD 
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DNA polymerase III beta subunit 
(Proliferating cell nuclear 
antigen=PCNA) 


Uncharacterized ArCR 


Peroxiredoxin 


Acetyltransferases 


Predicted transcriptional 
regulators 


HD superfamily 
phosphohydrolases 


Molybdopterin biosynthesis 
enzyme 


Ribosomal protein L23 


Molybdopterin biosynthesis 
enzymes 


Uncharacterized ACR 




Predicted phosphoesterases 


PLP-dependent aminotransferases 


Predicted carbamoyl transferase 


Prolyl-tRNA synthetase 
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Lactate dehydrogenase and 
related dehydrogenases COG 1052 
LdhA 


Cellulase M and related proteins 


Predicted DNA-binding protein 
containing a Zn-ribbon domain 


Uncharacterized ACR 




Translation initiation factor 
eIF-2B alpha subunit 


Predicted ATPases or kinases 


CBS domains 


Predicted transcriptional 
regulators 


Archaeal DNA-binding protein 
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synthase 
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Predicted DNA-binding proteins 
with PDl-like DNA-binding 
motif 


Specific archaeal helicases 


Tyrosyl-tRNA synthetase 


ABC-type 

dipeptide/oligopeptide/nickel 
transport systems 


ABC-type iron/thiamine transport 
systems 


ABC-type thiamine transport 
system 


ABC-type 

sugar/spermidine/putrescine/iron/t 
hiamine transport systems 


Predicted phosphohydrolases 


Transaldolase 


rRNA methylases 


Pyruvate dehydrogenase 


DNA polymerase III beta subunit 
(Proliferating cell nuclear 
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Chromosome segregation 
ATPases 


Predicted secreted acid 
phosphatase 


Predicted hydrolase of alkaline 
phosphatase superfamily 




tRNA-processing ribonuclease 
BN 


Ammonia permeases 


Ankyrin repeat proteins 


Predicted membrane protein 


Uncharacterized protein involved 
in cysteine biosynthesis 


Predicted nucleic acid-binding 
protein 


Predicted RNA-binding protein 
homologous to eukaryotic snRNP 


Ribosomal protein L37AE/L43A 
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Methylase of chemotaxis 
methyl-accepting proteins 
COG1352 CheR 


CheY-like receiver domains 


Chemotaxis response regulator 
CheB 


Chemotaxis protein histidine 
kinase and related kinases 


Chemotaxis protein histidine 
kinase and related kinases 


Chemotaxis protein CheC 


Chemotaxis protein CheC 


Methyl-accepting chemotaxis 
protein 


Chemotaxis protein; stimulates 
methylation of MCP proteins 
COG1871 CheD 


Uncharacterized archaeal 
coiled-coil domain 


Uncharacterized ACR 
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Predicted transposases 




Superfamily I DNA and RNA 
helicases and helicase subunits 


Surface lipoprotein 


ABC-type sugar (aldose) 
transport system 


Uncharacterized ABC-type 
transport system 


Uncharacterized ABC-type 
transport system 


ABC-type Mn/Zn transport 
system 


Pyruvate dehydrogenase 


Uncharacterized stress-induced 
protein 


Predicted 

phosphoribosyltransferases 


Acyl-CoA synthetase (NDP 
forming) 


Uncharacterized protein involved 
in cation transport 
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Zn-dependent protease with 
chaperone function 


Chaperonin GroEL (HSP60 
family) (Chaperonin A) 


Mn2+-dependent serine/threonine 
protein kinase 


Aspartate/aromatic 
aminotransferase 


MutS-like ATPases involved in 
mismatch repair 


Lactate dehydrogenase and 
related dehydrogenases COG1052 
LdhA 


Amino acid permeases 


NAD-dependent protein 
deacetylases 


Predicted hydrolases of the HAD 
superfamily 


Uncharacterized ACR 




Uncharacterized membrane 
protein 
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Anaerobic dehydrogenases 


Fatty acid/phospholipid 
biosynthesis enzyme 


ABC-type Na+ efflux pump 


ABC-type multidrug transport 
system 


ABC-type multidrug transport 
system 




Predicted nucleic acid-binding 
protein 


Predicted DNA binding domain 


Predicted Zn-dependent proteases 
and their inactivated homologs 


Predicted Zn-dependent proteases 
and their inactivated homologs 


Predicted pyrophosphatase 


MinD superfamily P-loop ATPase 
containing an inserted ferredoxin 
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MinD superfamily P-loop ATPase 
containing an inserted ferredoxin 
domain 


Fe-S oxidoreductases 


SAM-dependent 
methyltransferases COG0500 
SmtA 


NAD-dependent aldehyde 
dehydrogenases 


ABC-type Fe3+-siderophores 
transport systems 


ABC-type 

cobalamin/Fe3+-siderophores 
transport systems 


ABC-type 

cobalamin/Fe3+-siderophores 
transport systems 


Putative homoserine kinase type 
II (protein kinase fold) 


Uncharacterized ACR 


Uncharacterized ACR 
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Metal-dependent hydrolases of 
the beta-lactamase superfamily II 




Ferrous ion uptake system protein 
FeoB (predicted GTPase) 


Protein 


Protein 


ABC-type molybdate transport 
system 


ABC-type sulfate/molybdate 
transport systems 


ABC-type sulfate/molybdate 
transport systems 


Aldo/keto reductases 
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General secretory pathway 
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ABC-type multidrug transport 
system 


ABC-type multidrug transport 
system 


Uncharacterized proteins of 
WD40-like repeat family 


SAM-dependent 
methyltransferases COG0500 
SmtA 


Predicted membrane components 
of an uncharacterized 
iron-regulated ABC-type 
transporter SufB 


Iron-regulated ABC transporter 
ATPase subunit SufC 


Uncharacterized ACR 


Uncharacterized ArCR 


Predicted nucleic acid-binding 
protein 


Metal-dependent hydrolases of 
the beta-lactamase superfamily II 
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Dipeptidyl 

aminopeptidases/acylaminoacyl-p 
eptidases 


Predicted P-loop ATPase fused to 
an acetyltransferase 


Uncharacterized ACR 


Uncharacterized ACR 


ABC-type iron/thiamine transport 
systems 


Cytosine deaminase and related 
metal-dependent hydrolases 
COG0402 SsnA 


Aspartyl/asparaginyl-tRNA 
synthetases 


Queuine/archaeosine 
tRNA-ribosyltransferase 


Phosphate/sulphate permeases 


Glycosyltransferases 


Predicted acetyltransferase 


Lhr-like helicases 






CO 






FR 




> 


Pu, 




Pi 


oi 


DAP2 




1 




TbpA 


SsnA 


AsnS 


H 


PitA 




1 


Lhr 


1164 


1260 




vo 


o 

OS 

cn 






CN 


o 

CN 


CN 
VO 
cn 


OS 

to 


r- 

OO 


646488 


650006 


650570 


651073 


652236 


653513 


655065 


657119 


658622 


659823 


660120 


664401 


644598 


647582 


650099 


650656 


651285 


652400 


653784 


655958 


657722 


658797 


659850 


662859 


r-2 


f-2 


f-2 


f-1 


f-3 




cn 


r-3 


m 


r-2 


r-2 


cn 


1738 


r-- 


r- 
r- 
■<4- 


vo 
(N 


O 
OO 


2065 


OO 


2064 


2063 


1737 


1736 


OO 
OO 


1442882 


1439361 


1438794 


1438291 


1437038 


1435830 


1434299 


1431690 


1430736 


1429553 


1429223 


1424960 


1444780 


1441805 


1439300 


1438791 


1438180 


1437035 


1435594 


1433441 


1431656 


1430605 


1429528 


1429132 


646496 


650017 


650584 


651087 


652340 


653548 


655079 


657688 


658642 


659825 


660155 


664418 


644598 


647573 


650078 


650587 


651198 


652343 


653784 


655937 


657722 


658773 


659850 


660246 


OS 

vo 


<N 
OS 
VO 


cn 
OS 
VO 


OS 
VO 


IT) 

OS 


vo 

OS 
vo 


r- 
OS 
vo 


OO 

as 
vo 


a\ 

OS 


o 
o 


o 


CN 
O 



234 



KJ002 



Oh 

o 

CO 



Oh 

cn 

M 

Oh 



O 



CD 

S 

00 

2 

I 



s 

1 

0) 



e 

u 

a 

•§ 

CO 

O 

s 

O 



i 

O 



o 



o 
o 

Oh 

X 
(L> 



C/3 



o 



o 
bX) 



T3 



Pi 

PQ 
o 

p 



Oh| 

-a 



B 
2 

Oh 

1 

M 

O 



O 

1 



C/2 



CO 



< 



o 
X 



oo 
o 



OO 



o 
rs 



oo 



ON 



ON 



o 



CO 



oo 



o 
o 

ON 

vo 
vo 



oo 
o 

vo 
vo 
vo 
vo 



uo 

vo 

vo 



ON 

vo 

vo 

vo 



o 
oo 
oo 

vo 
vo 



oo 
vo 
oo 

vo 



vo 

ON 
vo 



OO 

oo 

vo 



CN 
VO 
ON 
UO 

vo 



m 

CN 
vo 



oo 
vo 



ON 

oo 

ON 

oo 
vo 



(N 
OO 

vo 
vo 



vo 
vo 



vo 
VO 
VO 



m 
vo 

vo 
vo 

vo 
vo 



m 

vo 
so 



ON 

vo 
vo 



ON 

vo 

CN 
O 

vo 



o 

vo 



ON 

m 
o 

vo 



ON 
vo 



o 

vo 
vo 



m 
vo 



ON 
(N 

OO 

vo 



CN 



CN 



CN 



C4^ 



CN 



CN 



CN 



oo 



CN 
VO 

o 

CN 



CO 



oo 
CN 



ON 
OO 



o 
oo 



oo 



ON 



o 
oo 



CN 

oo 



oo 



vo 
o 

CN 



CN 
C3N 

m 

CN 



m 
oo 

m 
cn 

CN 



CN 

vo 

CM 
CN 



ON 

o 

rsi 

CN 
CN 



CN 

o 

CN 
CM 
CN 



vo 
^— ( 
CN 



o 
oo 

ON 



C3N 



vo 



oo 
o 



oo 
o 

CN 



OO 
CN 
CN 



o 



o 
oo 
oo 



CO 



vo 

o 
cn 
CN 



O 
VO 

CN 
CM 



CN 
CN 
CN 



o 
vo 



?3 



CO 

vo 
ON 



vo 



CO 

wo 



CN 



CO 
ON 

CN 



VO 

o 

CN 



CO 
CN 



VO 
OO 

wo 
uo 
vo 

vo 



wn 
On 

WO 
VO 
vo 



vo 

VO 
vo 

vo 

vo 



ON 

vo 

vo 
vo 



VO 



VO 

vo 



<N 

vo 
VO 



oo 
oo 

vo 

VO 



oo 

Ov 



vo 



uo 
oo 
On 
CO 

vo 



ON 

vo 



O 

ON 
WO 

vo 



C3N 
CN 

vo 



o 

w^ 
oo 
vo 



CO 

vo 
o 
as 

vo 



oo 

vo 
vo 



CN 

vo 
wo 
vo 
vo 



CN 
CO 

CO 
vo 
VO 
VO 



OO 

vo 
vo 
vo 



CO 
CN 

vo 

vo 



OO 

CN 

vo 

VO 



CN 
OO 

vo 

vo 



w^ 

CO 

ON 
VO 

vo 



o 

CO 

vo 



CO 
CO 

o 

vo 



wo 

ON 

VO 



w^ 
CN 

VO 

VO 



CN 

o 

CO 

vo 



CO 

-•^ 

OO 

vo 



CO 

O 



o 



wo 
o 



vo 
o 



o 



oo 
o 



ON 

o 



o 



CN 



CO 



w^ 



vo 



235 



KJ002 



Sugar fermentation stimulation 
protein (uncharacterized) 




Predicted DNA-binding proteins 
with PDl-like DNA-binding 
motif 


Cellulase M and related proteins 




Uncharacterized ACR 


Rad3-related DNA helicases 


Adenine-specific DNA methylase 


Uncharacterized membrane 
protein (homolog of Drosophila 
rhomboid) 


Archaeal fructose- 1 


Predicted membrane proteins 


Glycerol dehydrogenase and 
related enzymes 


Uncharacterized ArCR 


Dihydrolipoamide 
acyltransferases 
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Uncharacterized ACR 


Periplasmic serine proteases 
(ClpP class) COG0616SppA 


GTPase subunit of restriction 
endonuclease 


Uncharacterized ArCR 


Uncharacterized ArCR 


DNA topoisomerase VI 


DNA topoisomerase VI 


Predicted RNA-binding protein 
(contains KH domains) 


Predicted serine/threonine protein 
kinases 


Translation initiation factor IF-1 


ABC-type Mn/Zn transport 
systems 


ABC-type Mn2+/Zn2+ transport 
systems 


Ribonuclease HII 


Dolichyl-phosphate-mannose-pro 
tein 0-maimosyl transferase 
PMTl 
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Phosphoribosylaminoimidazole 
carboxylase (NCAIR synthetase) 


Phosphoribosylcarboxyaminoimid 
azole (NCAIR) mutase 


Cation transport ATPases 


Thiol-disulfide isomerase and 
thioredoxins COG0526 TrxA 


Nitroreductase 


Nitroreductase 


Kef-type K+ transport systems 


Arsenite efflux pump ACR3 and 
related permeases 


Protein-tyrosine-phosphatase 


Aldehyde:ferredoxin 
oxidoreductase 


Aldo/keto reductases 


Uncharacterized ACR 


Uncharacterized conserved 
protein containing a 
ferredoxin-like domain 
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Predicted secreted protein 
containing a PDZ domain 


Predicted transposases 


Predicted nucleic acid-binding 
protein 


NaMN:DMB 
phosphoribosyltransferase 


Pyruvate-formate lyase-activating 
enzyme 


Cobyric acid synthase 


Parvulin-like peptidyl-prolyl 
isomerase 


GTPiadenosylcobinamide-phosph 
ate guanylyltransferase 


Cobalamin-5-phosphate synthase 
(Cobalamin synthase) 


Phosphatidylglycerophosphatase 
A 


Uncharacterized ACR 


Predicted ATPases of PP-loop 
superfamily 










O 




O 












SdrC 




VapC 


CobT 


PflA 


CobQ 


SurA 


1 


x> 
o 
U 


PgpA 


1 




oo 
cs 


as 

VO 
CN 


oo 

CN 


OO 
VO 

VO 


o 


m 


m 


VO 
CN 


CN 
CN 


o 

CN 


OO 

<N 


oo 
cn 

CN 


742076 


743185 


743586 


744598 


745208 


746665 


747171 


748315 


749013 


749438 


749629 


750661 


741944 


742684 


743481 


743596 


744698 


745381 


746862 


747778 


748338 


749042 


749548 


750211 








7. 






1 


1 


CN 
1 






7, 


ON 


VO 


VO 
VO 
oo 


1333 


cn 
On 




1721 


1332 


1720 


2052 


1331 


1330 


1347294 


1346002 


1345769 


1344775 


1344006 


1342552 


1341617 


1341025 


1340345 


1339935 


1339501 


1338664 


1347458 


1346694 


1345954 


1345791 


1344818 


1344009 


1342555 


1341612 


1341040 


1340348 


1339938 


1339170 


742084 


743376 


743609 


744603 


745372 


746826 


747761 


748353 


749033 


749443 


749877 


750714 


741920 


742684 


743424 


743587 


744560 


745369 


746823 


747766 


748338 


749030 


749440 


750208 


CN 
OO 


m 
oo 


oo 


VO 
OO 


VO 

oo 


oo 


oo 
00 


ON 
OO 


O 
ON 


ON 


CN 

ON 


cn 

ON 



241 



KJ002 




C/3 

o 
ex 
2 

JO 

+ 
m 



5 



C/3 

S 

1/3 

o 
CM 



o 
o 

CO 

O 

'O 
0) 



O 

Oh 



+ 




I 

e 



^3 



CD 
c/2 



O 



o 



CO 



O 
I 

CD 



<3 



C/3 

o 



-4— » 

o 



i 



o 

.1 

I 



CO 

.s 

o 

I 

13 



o 

CO 



OO 



OS 
ON 
ON 



OO 



vo 
cn 



CO 



ON 



PQ 

ID 
tin 

VO 
OO 

ro 



OO 

o 
in 



vo 
o 

ro 



ON 



OO 

vo 
m 



CO 
ro 
VO 
CO 
CO 



uo 



vo 
o 

CO 



iO 
ON 



OO 

o 



OO 

o 

wo 
wo 



vo 



o 



OO 

vo 
ON 
CO 
CO 
CO 



uo 

CO 
CO 



wo 
wo 



vo 
vo 

wn 



vo 

ON 



Ah 



wo 

ON 



OO 

o 

vo 
wo 



vo 
OO 

wo 
wo 
wo 



CO 

vo 
OO 



CO 
CO 



CN 
OO 
OO 
CO 
CO 
CO 



CO 

vo 
wo 



vo 

as 

wo 
wo 



ON 



{X, Pi 



o 

CO 



wo 

OS 
VO 
wo 



vo 
wo 



OO 
vo 
OO 



CO 



o 

ON 

<N 
CO 
CO 



OO 
vo 

ON 

vo 
wo 



vo 
wo 



OO 

o\ 



CO 



wo 
wo 



ON 
ON 
VO 
W^ 



ON 
CO 



ON 

CO 

CO 



o 

(N 
CO 
CO 



ON 
CM 
VO 

wo 



OO 

w^ 

ON 
VO 

wo 



ON 
ON 



5 



CO 

w^ 

oo 

w^ 



CO 
CO 

w^ 



CO 



o 
w^ 
o 



o 

CN 
ON 
O 
CO 
CO 



vo 
vo 

vo 

CO 
CO 



OO 

wo 

OO 

wo 



CN 

wo 



o 
o 

OO 



IX, 
00 



ON 

w^ 



o 

vo 

ON 

wo 



OO 

OS 

vo 

OO 

wo 



o 



CO 
CO 

ON 
CN 



C3N 
OO 
VO 
O 
CO 
CO 



wo 
vo 

ON 
WO 



OS 
OO 
vo 
OO 

wo 



o 

OO 



OO 



C3N 
WO 



C3N 
OO 
vo 

o 

vo 



CN 

vo 

ON 
WO 



CO 

ON 
VO 
OO 



OO 

vo 

OO 
CN 
CO 



vo 
vo 

C?N 

CN 

CO 



ON 

vo 
o 

vo 



CN 
VO 

C3N 
WO 



CN 
O 
OO 



CO 
CO 



wo 
CO 



vo 



CN 

o 

vo 



CO 



ON 

O 
CN 



O 

CN 
CO 



o 

ON 

vo 
OO 
(N 
CO 



vo 
VO 



OO 
OO 
VO 

o 

vo 



CO 

o 

OO 



C3N 
>< 

OO 

wo 



CO 
OO 
CO 
CO 

vo 



cs 

CO 
CN 
vo 



CO 

O 
OO 



o 

vo 

ON 

wo 
cs 

CO 



wo 
o 

CN 

CO 



CO 
vo 



CN 

CO 

CN 
VO 



o 

OO 



242 



KJ002 



Uncharacterized ACR 




Translation elongation factor 
P/translation initiation factor 
eIF-5A 


Uncharacterized ACR 


Na+/H+ antiporter NhaD and 
related arsenite permeases 


Universal stress protein UspA and 
related nucleotide-binding 
proteins 


Arginase/agmatinase/formimiono 
glutamate hydrolase 


CBS domains 


Kef-type K+ transport systems 


Cation transport ATPases 


Chloride channel protein EriC 


Uncharacterized ACR 


Regulators of 
stationary/sporulation gene 
expression 
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Thiamine monophosphate kinase 


Predicted glycosyltransferases 


Predicted xylanase/chitin 
deacetylase 


Pyruvate-formate lyase-activating 
enzyme 


Uncharacterized protein 


Archaea-specific RecJ-like 
exonuclease 


Gamma-glutamyltranspeptidase 


Uncharacterized Fe-S 
oxidoreductases 


Dimethyladenosine transferase 
(rRNA methylation) 


Predicted RNA-binding protein 


Uncharacterized ArCR 


Ribosomal protein L21E 


Predicted pseudouridylate 
synthase 


Uncharacterized ArCR 
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UDP-N-acetylmuramyl 
pentapeptide 

phosphotransferase/UDP-N- 
acetylglucosamine- 1 -phosphate 
transferase 


Deoxyinosine 3*endonuclease 
(endonuclease V) 


Translin (RNA-binding protein 
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Glu-tRNAGh amidotransferase 
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Uncharacterized ArCR 


Predicted nucleic acid-binding 
protein 
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amidotransferase subunit E 
(contains GAD domain) 


Bacterial cell division membrane 
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Hydroxymethylglutaryl-CoA 
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Phenylalanyl-tRNA synthetase 
beta subunit 


Pseudouridylate synthase (tRNA 
psi55) 


DNA or RNA helicases of 
superfamily II 


3-polyprenyl-4-hydroxybenzoate 
decarboxylase and related 
decarboxylases 


Acetyltransferases 


Predicted transposase 


Uncharacterized ACR 


Molybdopterin-guanine 
dinucleotide biosynthesis protein 


Predicted periplasmic binding 
protein 


Helicase subunit of the DNA 
excision repair complex 


Integral membrane protein 




Predicted archaeal sugar kinases 
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Predicted ATPase of the PP-loop 
superfamily implicated in cell 
cycle control 


Hydrogenase maturation factor 


Predicted nucleotidyltransferases 


Predicted RNA-binding proteins 




Xaa-Pro aminopeptidase 


Predicted hydrolases of the HAD 
superfamily 


Ribosomal protein L35AE/L33A 


N2 


Uncharacterized membrane 
proteins 


Predicted alternative tryptophan 
synthase beta-subunit (paralog of 
TrpB) 




Nuclease subunit of the 
excinuclease complex 
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Na+-driven multidrug efflux 
pump 


Ribosomal protein L37E 


Small nuclear ribonucleoprotein 
(snRNP) homolog 


Glvcosidases 




Glycyl-tRNA synthetase 


Predicted permeases 


ABC-type spermidine/putrescine 
transport system 


Predicted DNA modification 
methylase 




Membrane carboxypeptidase 
(penicillin-binding protein) 


Predicted metal-dependent 
hydrolases related to alanyl-tRNA 
synthetase HxxxH domain 








O 




1— » 


Pi 


w 










NorM 


CO 

< 


LSMl 


AmyA 




GRSl 




PotB 






MrcA 










CN 
vo 
CN 




OO 

Os 


On 
CN 


CN 


wo 
m 




ON 
CN 


CN 
O 


851004 


851317 


851574 


854012 




855836 


856650 


856763 


858216 




860266 


861079 


849678 


851134 


851352 


852581 




854129 


855975 


856637 


857238 




860128 


860443 


^ 




CN 


CN 




CN 


m 
ul. 






(N 




>^ 


1707 


1321 


1706 


en 


1320 




oo 
oo 
oo 


2030 


ON 
OO 
OO 






1319 


1238342 


1238053 


1237796 


1235343 


1237495 


1233537 


1232726 


1232580 


1231151 


1230444 


1229038 


1228294 


1239709 


1238244 


1238032 


1237640 


1237560 


1235252 


1233490 


1232741 


1232227 


1230650 


1229298 


1228974 


851036 


851325 


851582 


854035 


851883 


855841 


856652 


856798 


858227 


858934 


860340 


861084 


849669 


851134 


851346 


851738 


851818 


854126 


855888 


856637 


857151 

1 
i 


858728 


860080 


860404 


OO 
ON 
OO 


ON 

as 

00 


O 

o 

Ov 


O 


(N 
O 
ON 


m 
o 

ON 


^ 
o 

ON 


o 

ON 


VO 

o 

C7\ 


o 

On 


OO 
O 
C3N 


o\ 
o 



251 



KJ002 



ABC-type 

dipeptide/oligopeptide/nickel 
transport systems 


Na+/H+-dicarboxylate symporters 


Biotin-(acetyl-CoA carboxylase) 
ligase 


Uncharacterized FAD-dependent 
dehydrogenases 


DhnA-type fructose- 1 


Pyruvate carboxylase 


Uncharacterized ACR 


Carbon starvation protein 


Response regulators consisting of 
a CheY-like receiver domain and 
a HTH DNA-binding domam 
COG0745 OmpR 


ABC-type 

dipeptide/oligopeptide/nickel 
transport system 


Arsenite transporting ATPase 


Putative aromatic ring 
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hydroxylating enzyme 


Chromosome segregation 
ATPases 


Regulators of 
stationary/sporulation gene 
expression 


Lysophospholipase 


Intracellular septation protein A 


Valyl-tRNA synthetase 


Adenylosuccinate synthase 


Flavodoxin reductases 
(ferredoxin-NADPH reductases) 
family 1 


Nucleoside-diphosphate-sugar 
epimerasesCOG0451 WcaG 


Predicted prefoldin 








Alcohol dehydrogenase IV 
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Ferredoxin 3 


ABC-type multidrug transport 
system 




Predicted Fe-S oxidoreductases 


Survival protein 




Predicted membrane protein 


ATPases of the AAA+ class 


Seryl-tRNA synthetase 


Predicted EndoIII-related 
endonuclease 


Predicted transcriptional 
regulators 


ABC-type multidrug transport 
system 


ABC-type multidrug transport 
system 


3-phosphoglycerate kinase 


Restriction endonuclease 
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Isopropylmalate/homocitrate/citra 
malate synthases 


Methylmalonyl-CoA mutase 


Glutamate-1 -semialdehyde 
aminotransferase 


ABC-type transport system 
involved in multi-copper enzyme 
maturation 


ABC-type multidrug transport 
system 


Predicted exonuclease of the 
beta-lactamase fold involved in 
RNA processing 


ABC-type multidrug/protein/lipid 
transport system 


Molecular chaperone (small heat 
shock protein) 


Flagellar hook capping protein 


ATPases of the AAA+ class 


Type 11 restriction enzyme 
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Nuclease subunit of the 
excinuclease complex 
(TBP-interacting protein) 


Undecaprenyl pyrophosphate 
synthase 


Carbonic 

anhydrases/acetyltransferases 


Holliday junction resolvase - 
archaeal type 


Predicted membrane protein 


Cellulase M and related proteins 


Predicted Zn-dependent proteases 


Adenylate cyclase 


TPR-repeat-containing proteins 


ABC-type cobalt transport system 


Trk-type K+ transport systems 


Methionine aminopeptidase 




Parvulin-like peptidyl-prolyl 
isomerase 
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PLP-dependent aminotransferases 


Xanthine/uracil permeases 
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Multisubunit Na+/H+ antiporter 


Predicted subunit of the 
Multisubunit Na+/H+ antiporter 


Multisubunit Na+/H+ antiporter 


Multisubunit Na+/H+ antiporter 
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Dihydrodipicolinate 
synthase/N -acety Ineuraminate 
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Deoxyribodipyrimidine 
photolyase 


Peptide chain release factor eRFl 
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Predicted transcriptional regulator 
with C-terminal CBS domains 


Permeases of the drug/metabolite 
transporter (DMT) superfamily 
COG0697 RhaT 


Uncharacterized archaeal 
coiled-coil domain 


Exopolyphosphatase-related 
proteins 


Uncharacterized ACR 


Transcriptional regulator of a 
riboflavin/FAD biosynthetic 
operon 


SAM-dependent 
methyltransferases COG0500 
SmtA 


Valyl-tRNA synthetase 


Ribosomal protein S19E (SI 6 A) 


Predicted metal-dependent RNase 


DNA-binding protein 
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NAD-dependent DNA ligase 
(contains BRCT domain type II) 


Transcription initiation factor IIB 


5-3' exonuclease (including 
N-terminal domain of Poll) 


Molybdopterin biosynthesis 
enzyme 


Predicted subimit of 
tRNA(5-methylaminomethyl-2-th 
iouridylate) methyltransferase 


Putative intracellular 
protease/amidase 


Transcriptional regulators 


Predicted GTPase 


Uracil phosphoribosyltransferase 


Uracil phosphoribosyltransferase 


Na+-driven multidrug efflux 
pump 


Na+-driven multidrug efflux 
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pump 


Predicted nucleic acid-binding 
protein 


Regulators of 
stationary/sporulation gene 
expression 


Phosphoenolpyruvate 
synthase/pyruvate phosphate 
dikinase 




Uncharacterized membrane 
protein 




Superfamily II DNA and RNA 
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Kef-type K+ transport systems 


Uncharacterized ACR 


Asparagine synthase 
(glutamine-hydrolyzing) 


Translation initiation factor 2 
(GTPase) 
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system 


Predicted transcriptional 
regulators 


Cobalamin biosynthesis protein 
CbiM 


Uncharacterized BCR 


Translation initiation factor 2 
(GTPase) 


UDP-N-acetylglucosamine:LPS 
N-acetylglucosamine transferase 






DNA gyrase (topoisomerase II) A 
subunit 


Glycine cleavage system protein 
P (pyridoxal-binding) 


ABC-type sugar transport systems 


ABC-type 

sugar/spermidine/putrescine/iron/t 
hiamine transport systems 
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Guanylate kinase 


ABC-type Fe3+-siderophores 
transport systems 


Excinuclease ATPase subunit 


SH3 domain protein 


3-Methyladenine DNA 
glycosylase 


Regulators of 
stationary/sporulation gene 
expression 


Chromosome segregation 
ATPases 




Glycine cleavage system protein 
P (pyridoxal-binding) 


Glycine cleavage system protein 
P (pyridoxal-binding) 


Flagellar biosynthesis/type III 
secretory pathway ATPase 
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Pyruvateiferredoxin 
oxidoreductase and related 
2-oxoacid:ferredoxin 
oxidoreductases 


Predicted hydrolases of the HAD 
superfamily 


Predicted ICC-like 
phosphoesterases 


Uncharacterized membrane 
protein 


Chromosome segregation 
ATPases 


Integrase 


ABC-type Fe3+-siderophores 
transport systems 


NADH 

dehydrogenase/N ADH :ubiquinon 
e oxidoreductase 75 kD subunit 
(chain G) 


Uncharacterized 

NAD(FAD)-dependent 

dehydrogenases 
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Predicted dehydrogenase 


Predicted dehydrogenase 


CBS domains 




Glycerol kinase 


Glycerophosphoryl diester 
phosphodiesterase 


Glycerophosphoryl diester 
phosphodiesterase 
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Predicted deacetylase 


Thymidylate kinase 


Phosphomannomutase 


Phosphoenolpyruvate 
carboxykinase (GTP) 


Glucan phosphorylase 


Na+-dependent transporters of the 
SNF family 
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Glutamyl- and glutaminyl-tRNA 
synthetases 


DNA primase (bacterial type) 


Small primase-like proteins 
(Toprim domain) 


Fe-S oxidoreductases family 2 


Histones H3 and H4(Histon 
A&B) 






Ribosomal protein 
L12E/L44/L45/RPP1/RPP2 


Ribosomal protein LIO 


Ribosomal protein LI 




Ribosomal protein LI 1 


Transcription antiterminator 


Protein translocase subunit Sssl 


Cell division Gl'Pase 


Uncharacterized ArCR 


L-fucose isomerase and related 
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proteins 


Uncharacterized ACR 


Ribose 5-phosphate isomerase 


Sulfate permease and related 
transporters (MFS superfamily) 


Predicted metal-dependent RNase 


Proteasome protease subiinit 


Archaeal adenylate kinase 


Glutamate dehydrogenase/leucine 
dehydrogenase 


Glutamate dehydrogenase/leucine 
dehydrogenase 


NaH--dependent transporters of the 
SNF family 


Parvulin-like peptidyl-prolyl 
isomerase 


Uncharacterized BCR 


Nitrogen regulatory protein PII 


Uncharacterized ACR 
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Methionine synthase 11 
(cobaiamin-independent) 




Cystathionine 

beta-lyases/cystathionine 

gamma-synthases 


Ribonuclease P subunit Rpp30 


Uncharacterized ArCR 


ABC-type multidmg/protein/lipid 
transport system 


ABC-type branched-chain amino 
acid transport systems 


Ribosomal protein L15E 


Xaa-Pro aminopeptidase 


Uncharacterized ACR 


Uncharacterized ACR 


Permease 


Uncharacterized archaeal 
coiled-coil domain 


Type II restriction enzyme 
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Uncharacterized ArCR 


DNA-directed RNA polymerase 
subunit E" 


DNA-directed RNA polymerase 
subunit E' 


Inorganic pyrophosphatase 


Aconitase A 


GAF domain-containing proteins 


Signal peptidase I 


Isopropylmalate/homocitrate/citra 
malate synthases 


6-pyruvoyl-tetrahydropterin 
i synthase 


C4-dicarboxylate transporter 


Nucleoside-diphosphate-sugar 
epimerases COG0451 WcaG 


Predicted nucleotidyltransferases 


Uncharacterized ACR 
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Glycosyltransferases involved in 
cell wall biogenesis 


Predicted glycosyltransferases 


Predicted glycosyltransferases 




Outer membrane 
lipoprotein-sorting protein 




Restriction enzymes type I 
helicase subimits and related 
helicases 


Uncharacterized 

membrane-associated 

protein/domain 


Predicted archaeal membrane 
protein 


Predicted glycosyltransferases 


Predicted sugar phosphatases of 
the HAD superfamily 


Uncharacterized ACR 


Ribonucleotide reductase alpha 


















CO 




O 






WcaA 


RfaG 


RfaG 




LolA 




HsdR 






RfaG 


NagD 


t 


NrdA 


ON 




<3s 




CO 




m 


m 


lO 

m 


oo 


OS 

oo 
m 


CO 

cn 




1513637 


1514773 


1516792 




1520620 




1522401 


1523624 


1525714 


1526432 


1530284 


1530722 


1536162 


1513040 


1513756 


1516165 




1520431 




1521990 


1523219 


1525372 


1526066 


1529501 


1530557 


1534812 


f-2 






r-2 


f-1 


r-2 


f-3 


f-2 


f-1 


r-3 


f-2 


f-2 


f-3 




r- 

fS 


OO 

in 

CM 


1569 


as 

CN 


1568 


*n 
OO 
OS 


OO 

VO 


O 
VO 
CM 


1905 


ON 
VO 


O 
CN 
VO 


VO 
OO 
OS 


575619 


574543 


572536 


570809 


567778 


569453 


566786 


564711 


563302 


562929 


559083 


558645 


553214 


576392 


575622 


573501 


570868 


569562 


569554 


567643 


566168 


564303 


563312 


559889 


559082 


558484 


1513759 


1514835 


1516842 


1518569 


1521600 


1519925 


1522592 


1524667 


1526076 


1526449 


1530295 


1530733 


1536164 


1512986 


1513756 


1515877 


1518510 


1519816 


1519824 


1521735 


1523210 


1525075 


1526066 


1529489 


1530296 


1530894 


1584 


1585 


1586 


1587 


1588 


1589 


1590 


1591 


1592 


1593 


1594 


1595 


1596 



306 



KJ002 



subunit 


Predicted 

phosphoribosyltransferases 


< 

Ph 

1 


tRNA nucleotidyltransferase 
(CCA-adding enzyme) 


GDP-D-maimose dehydratase 


Diphthamide synthase subunit 
DPH2 


Permeases of the drug/metabolite 
transporter (DMT) superfamily 
COG0697 RhaT 


Uncharacterized BCR 


Uncharacterized ACR 


Isoleucyl-tRNA synthetase 


Cobyric acid synthase 


Sensory transduction histidine 
kinases 




Predicted nucleic acid-binding 
protem 
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Uncharacterized ACR 


Cytochrome b subunit of the be 
complex 


Beta-galactosidase 

( exo-beta-D-glucosaminidase) 


Predicted phosphosugar 
isomerases 


ABC-type 

dipeptide/oligopeptide/nickel 
transport system 


ABC-type 

dipeptide/oligopeptide/nickel 
transport system 


ABC-type 

dipeptide/oligopeptide/nickel 
transport systems 


ABC-type 

dipeptide/oligopeptide/nickel 
transport systems 


ABC-type 

dipeptide/oligopeptide/nickel 
transport systems 
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ABC-type 

dipeptide/oligopeptide/nickel 
transport systems 


Beta-glucosidase/6-phospho-beta- 
glucosidase/beta- galactosidase 


Predicted nucleic acid-binding 
protein 


deacetylase 


Chitinase 


Uncharacterized ACR related to 
pyruvate formate-lyase activating 
enzyme 


RNase P subunit P14 and its 
archaeal ortholoes 


Glycogen synthase 


Predicted transcriptional 
regulators 




Glycosidases 


Maltose-binding periplasmic 
proteins/domains 
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ABC-type sugar transport systems 


Sugar permeases 


Alpha-amylase/alpha-mamiosidas 
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ABC-type 

sugar/spermidine/putrescine/iron/t 
hiamine transport systems 


Uncharacterized ArCR 
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Metal-dependent hydrolases of 
the beta-lactamase superfamily I 


Uncharacterized membrane 
protein 


Transcriptional regulator 


Acetylomithine 

deacetylase/Succinyl-diaminopim 
elate desuccinylase and related 
deacylases 


Threonyl-tRNA synthetase 


Endoglucanase 
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SAM-dependent 
methyltransferases COG0500 
SmtA 


Predicted SAM-dependent 
methyltransferases 


Predicted phosphate-binding 
enzymes 


Short-chain dehydrogenases of 
various substrate specificities 


RecA-superfamily ATPases 
implicated in signal transduction 


Eukaryotic-type DNA primase 


Eukaryotic-type DNA primase 


Permeases of the drug/metabolite 
transporter (DMT) superfamily 
COG0697 RhaT 




Uncharacterized ArCR 
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2-polyprenylphenol hydroxylase 
and related flavodoxin 
oxidoreductases- COG0543 UbiB 


Predicted Fe-S oxidoreductases 




ABC-type 

dipeptide/oligopeptide/nickel 
transport systems 


Alpha-amylase/alpha-mannosidas 
e (4-alpha-glucanotransferase) 


Dolichol kinase 


Uncharacterized archaeal 
membrane protein 


Kef-type K+ transport systems 


Glutamate decarboxylase and 
related PLP-dependent proteins 


Predicted transcriptional 
regulators containing the 
CopG/Arc/MetJ DNA-binding 
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Growth inhibitor 
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Predicted Co/Zn/Cd cation 
transporters 


DNA-directed RNA polymerase 
beta subunit/140 kD subunit (split 
gene in Mian 


Membrane protein involved in the 
export of 0-antigen and teichoic 
acid 


Predicted membrane-associated 
Zn-dependent proteases 1 


Predicted ATPase of the PP-loop 
superfamily implicated in cell 
cycle control 




Uncharacterized ATPases of the 
AAA superfamily 


Isoleucyl-tRNA synthetase 


Arabinose efflux permease 


Sulfate permease and related 
transporters (MFS superfamily) 
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Transcriptional regulators 


Beta-glucosidase/6-phospho-beta- 
glucosidase/beta- galactosidase 


Galactose- 1 -phosphate 
uridylyltransferase 




Uncharacterized ACR 


Galactokinase 


Predicted membrane protein 


Uncharacterized ACR 


Arginyl-tRNA synthetase 


Pyrrolidone-carboxylate peptidase 
(N-terminal pyroglutamyl 
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Dehydrogenases (flavoproteins) 


Anaerobic dehydrogenases 


Sugar-binding periplasmic 
proteins/domains 


Zn-dependent carboxypeptidases 
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Predicted site-specific 
integrase-resolvase 


Predicted transposases 


Sugar kinases 


Uncharacterized ACR 


MoxR-like ATPases 


Signal peptidase 


Uncharacterized ArCR 


FKBP-type peptidyl-prolyl 
cis-trans isomerases 2 


Predicted membrane protein 




Predicted membrane protein 


Inteins 


Thiamine 

pyrophosphate-dependent 
dehydrogenases 


Uncharacterized ArCR 


Predicted transposase 


Predicted GTPases 


ABC-type transport systems 
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ABC-type transport systems 




ABC-type 

dipeptide/oligopeptide/nickel 
transport systems 


ABC-type 

dipeptide/oligopeptide/nickel 
transport systems 


Predicted N6-adenine-specific 
DNA methylases 


ABC-type phosphate transport 
system 


Sugar phosphate 
isomerases/epimerases 


ABC-type phosphate transport 
system 


ABC-type phosphate transport 
system 


ABC-type phosphate transport 
system 


Phosphate uptake regulator 
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Phosphate uptake regulator 


Glycosyltransferases involved in 
cell wall biogenesis 
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Permeases of the drug/metabolite 
transporter (DMT) superfamily 
COG0697RhaT 


Predicted ATPase of the AAA 
superfamily 


Metal-dependent hydrolases of 
the beta-lactamase superfamily I 


Acyl-CoA synthetase (NDP 
forming) 


Predicted transcriptional 
regulators 


Uncharacterized ArCR 
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Predicted transcriptional 
regulators 


Glvcosidases 


Phosphoribosylaminoimidazolesu 
ccinocarboxamide (SAICAR) 
synthase 


Predicted sugar kinase 


Na+/melibiose symporter and 
related transporters 




Uncharacterized ACR related to 
the C-terminal domain of histone 
macroH2Al 


Cytosine deaminase and related 
metal-dependent hydrolases 
COG0402 SsnA 


Predicted transglutaminase-like 
proteases 


Putative glycerate kinase 


Uncharacterized membrane 
protein 


Purine nucleoside phosphorylase 
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Archaeal enzymes of ATP-grasp 
superfamily 


Uncharacterized ACR 


Predicted nuclease of the RecB 
family 


RecA/RadA recombinase 


Kef-type K+ transport systems 
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Lipoate-protein ligase A 


Glycosyltransferases involved m 
cell wall biogenesis 


RecA-superfamily ATPases 
implicated in signal transduction 
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Outer membrane receptor proteins 


Small-conductance 
mechanosensitive channel 


Uncharacterized ACR 


Transcription-repair coupling 
factor - superfamily II helicase 
COG1197 Mfd 


Uncharacterized ACR 


Uncharacterized ACR 


Uncharacterized proteins of PilT 
N-term.A^apc superfamily 


GTPases - translation elongation 
factors COG0050 TufB 




Adenine-specific DNA methylase 


Ribosomal protein S6E (SIO) 


K+ transporter 


Chromosome segregation 
ATPases 


Predicted GTPases 


Oh 






LK 






Pi 


JE 






»— » 


Oh 


Q 


Pi 


CirA 


MscS 


1 


Mfd 








TufB 






so 

CO 

< 


Kup 


Smc 




CN 

cn 


OS 


to 
»o 


m 


o 

CN 


CN 




VO 




CN 


vo 


CN 




to 

CN 


1751238 


1752785 


1753491 


1755211 


1756041 


1756826 


1757452 


1758730 




1760721 


1762556 


1762844 


1763446 


1764109 


1750896 


1751852 


1752852 


1755019 


1755450 


1756133 


1757053 


1757503 




1760619 


1762181 


1762772 


1763275 


1763593 


r-2 


r-3 


r-2 


r-1 


r-2 


r-3 


r-1 


r-1 


r-2 


f-3 


r-3 


f-2 


r-1 


f-1 


OO 
CN 
to 


oo 
oo 


tN 

>o 


o 


vo 
CN 

ir> 


oo 


ON 


OO 

VO 

1-H 


to 

CN 

to 


CN 

m 
o 


vo 

oo 


vo 


1167 


t- 

CN 


337835 


336585 


335885 


334087 i 


333278 


332454 


331918 


330643 


330380 


328643 


326820 


326532 


325885 


325237 


338962 


337661 


336583 


335910 


333934 


333245 


332349 


331884 


330508 


328984 


327212 


326702 


326535 


325788 


1751543 


1752793 


1753493 


1755291 ! 


1756100 


1756924 


1757460 


1758735 


1758998 


1760735 


1762558 


1762846 


1763493 


1764141 


1750416 


1751717 


1752795 


1753468 


1755444 


1756133 


1757029 


1757494 


1758870 


1760394 


1762166 


1762676 


1762843 


1763590 




CN 
ON 
C^ 


as 


as 


to 

ON 


VO 


o\ 


OO 
C3N 


ON 
ON 


O 
O 
OO 


O 
OO 


CN 
O 
OO 


o 
oo 


1804 



323 



KJ002 



Transcriptional regulators 
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Phosphoglycerate dehydrogenase 
and related dehydrogenases 


Phosphate transport regulator 
(distant homolog of PhoU) 


Ketopantoate reductase 




rRNA methvlase 


Methylated DNA-protein cysteine 
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ABC-type 

cobalamin/Fe3+-siderophores 
transport systems 


ABC-type 

cobalamin/Fe3+-siderophores 
transport systems 


ATPases involved in chromosome 
partitioning 


Uncharacterized BCR 
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Transcription initiation factor HE 


DNA primase (bacterial type) 


ABC-type multidrug transport 
system 




ABC-type multidrug transport 
system 


Sugar kinases 


Predicted regulator of amino acid 
metabolism (contains the ACT 
domain) 


Acylphosphatases 
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Uncharacterized protein involved 
in tolerance to divalent cations 


Uncharacterized ACR 


Universal stress protein UspA and 
related nucleotide-binding 
proteins 


Glycine cleavage system T 
protein (aminomethyltransferase) 


Permeases of the drug/metabolite 
transporter (DMT) superfamily 
COG0697RhaT 


Signal peptidase I 


Uncharacterized ArCR 


TPR-repeat-containing proteins 


Zn-dependent dipeptidase 


Predicted transcriptional 
regulators 


RecA-superfamily ATPases 
implicated in signal transduction 
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Non-ribosomal peptide synthetase 
modules and related proteins 


Sugar permeases 


Predicted archaeal 
methyltransferase 


Uncharacterized ACR 


Predicted membrane protein 


Micrococcal nuclease 
(thermonuclease) homoloes 


Dipeptidyl 

aminopeptidases/acylaminoacyl-p 
eptidases 


Permeases of the drug/metabolite 
transporter (DMT) superfamily 
COG0697 RhaT 


Predicted Ser/Thr protein kinase 


Transcriptional regulators 


ABC-type multidrug transport 
system 
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ABC-type transport system 
involved in multi-copper enzyme 
maturation 


ABC-type transport system 
involved in multi-copper enzyme 
maturation 


ATP-dependent Lon protease 




Phosphate transport regulator 
(distant homolog of PhoU) 


Phosphate/sulphate permeases 


Predicted phosphohydrolases 


Uncharacterized ACR 
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Uncharacterized ArCR 


Ni (Hydrogenase maturation 
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transporters (formate 
dehydrogenase) 


Muhisubunit Na+/H+ antinorter 




Multisubunit Na+/H+ antiporter 


Multisubunit Na+/H+ antiporter 


Predicted subunit of the 
Multisubunit Na+/H+ antiporter 


Multisubunit Na+/H+ antiporter 


Multisubunit Na+/H+ antiporter 


Multisubunit Na+/H+ antiporter 


Formate hydrogenlyase subunit 
3/Multisubunit Na+/H+ antiporter 


Sugar transferases involved in 
lipopolysaccharide synthesis 




52 




Formate hydrogenlyase subunit 4 






Oh 


Oh 


Ph 


Ph 


PL, 


Oh 


CP 




u 




o 


o 




MnhE 




MnhF 


MnhG 




MnhR 


MnhB 


MnhC 


HyfB 


WcaJ 


1 


HycE 


HycE 






OO 
fS 


oo 

CN 


CN 


m 


VO 
VO 


OS 

■ o\ 


VO 


OIL 


o 

CI 


C3N 
O 

m 


00 

CN 


ON 
ON 
VO 


ON 
OO 

m 




1872555 


1872809 


1873159 


1873440 


1873733 


1874176 


1874535 


1876073 


1876188 


1876993 


1877556 


1878836 


1879833 




1872054 


1872563 


1872817 


1873251 


1873439 


1873741 


1874178 


1874546 


1876080 


1876465 


1877043 


1877567 


1878861 




f-3 


f-2 




f-3 


f-2 




f-3 


f-2 


f-3 




f-3 


f-2 


f-3 




1052 




VO 

o 
m 


1053 


lO 
VO 


o 

CO 


1054 


VO 

vo 


1055 


oo 
o 

c-i 


1056 


VO 


1057 




216821 


216567 


216199 


215936 


215643 


215197 


214841 


213300 


212951 


212383 


211817 


210540 


209543 




217363 


216845 


216570 


216202 


215939 


215646 


215209 


214844 


213307 


212913 


212386 


211820 


210535 




1872557 


1872811 


1873179 


1873442 


1873735 


1874181 


1874537 


1876078 


1876427 


1876995 


1877561 


1878838 


1879835 




1872015 


1872533 


1872808 


1873176 


1873439 


1873732 


1874169 


1874534 


1876071 


1876465 


1876992 


1877558 


1878843 




VO 
CN 
ON 


c-- 

CN 
ON 


OO 
CN 
ON 


ON 
CN 
Ov 


o 

0\ 


m 
OS 


CN 
ON 


CO 

m 

Ov 


On 


lO 

m 


VO 

CO 
ON 


m 

ON 


OO 

m 

ON 



335 



KJ002 



Formate hydrogenlyase subunit 
6/NADH:ubiquinone 
oxidoreductase 23 kD subunit 
(chain I) 


Uncharacterized ACR 


Uncharacterized ACR 


Uncharacterized ACR 


Uncharacterized protein sharing a 
conserved domain with thiamine 
biosynthesis protein Thil 


ATPase involved in DNA 
replication 


Predicted membrane-bound 
metal-dependent hydrolases 


Thioredoxin reductase 


PLP-dependent aminotransferases 


Nitrate/nitrite transporter 


Predicted RNA-bindmg proteins 


Deoxyribose-phosphate aldolase 


Enolase 


o 


CO 










Pi 


O 


w 








o 


Nuol 






1 


1 


HolB 




TrxB 


ArgD 


NarK 




DeoC 


Eno 


oo 

On 




ir\ 
oo 


O 


ON 


O 
VO 




o 


yn 


o 


m 


CN 


CN 


1880195 


1880729 


1881246 


1881745 


1882261 


1883525 


1884074 


1885144 


1886607 


1886980 


1887549 


1888216 


1890020 


1879847 


1880270 


1880790 


1881289 


1881790 


1882352 


1883549 


1884157 


1885290 


1886914 


1887291 


1887553 


1888727 


f-2 


r-3 


r-2 


r-l 


i. 


f-2 


f-2 


f-1 


f-3 


f-1 


r-2 


r-l 


f-2 


oo 
r— 
VO 


1859 


1501 


1140 


1139 


ON 


O 
OO 
VO 


OS 
O 

m 


1058 


O 

m 


1500 


1138 


OO 

VO 


209115 


208581 


208100 


207619 


207106 


205836 


205302 


204229 


202751 


202108 


201818 


201160 


199353 


209546 


209114 


208594 


208107 


207588 


207044 


205835 


205221 


204097 


101101 


202111 


201834 


200654 


1880263 


1880797 


1881278 


1881759 


1882272 


1883542 


1884076 


1885149 


1886627 


1887270 


1887560 


1888218 


1890025 


1879832 


1880264 


1880784 


1881271 


1881790 


1882334 


1883543 


1884157 


1885281 


1886671 


1887267 


1887544 


1888724 


1939 


1940 


1941 


1942 


1943 


1944 


1945 


1946 


1947 


1948 


.1949 


1950 


1951 



336 



KJ002 



o 
o 

CO 



n3 



o 

■a 

3 
2 



o. 
too 

I 



Pi 
N 

i 



CO 



V3 

I 



§ 

•c 

V3 



1-1 



O 

-a 

• ^ 

CO 

O 

X 



tin 



I 



0< 

< 

N 

I 



2 
o 

3 
.2 



& 



CO 

I 

o 

o 
-a 

o 
-♦-* 



o 
o 

a 
o 

2 



O 



I 



< 
B 



O 

o 

I 

< 



CO 



I 

CO 



CO 

too 




CD 
(D 

>^ 

CO 

'C 

o 

'tob 



Q 

u 



CJ 



oo 



OO 

m 



o 



oo 



m 
On 
CN 



CO 



CO 



oo 



CN 
uo 

o 
a\ 
oo 



ON 

cn 
oo 





o 


OS 




OS 
















ON 


On 


On 


OO 


OO 


OO 









OO 

On 
OO 



ON 

Os 
OO 



oo 
oo 

ON 

oo 



CN 

o\ 

Os 
OO 



o 
o 

On 
On 
On 
OO 



oo 
o 
o 

<7n 



OO 



o 

ON 



o 
o 

ON 

oo 



CN 



as 



OO 

oo 
as 



CN 

m 

OS 
<5N 



uo 
o 

ON 

oo 



o 
o 
o 

C?N 

oo 



CN 
ON 



CN 



as 
oo 



CN 


CN 


o 






CO 




(N 








to 


ON 


ON 


as 


OO 


OO 


oo 









CN 
CO 

uo 



oo 

C7N 



CN 
O 

oo 



CO 

o 

On 
OO 



CO 
UO 
ON 



CN 
CO 



ro 

O 
uo 

On 



O 
VO 
O 

ON 



VO 
CO 

On 
OO 



OO 
CO 

oo 



uo 
as 



CN 



OO 
VO 



o 

CN 
CN 

C7N 



VO 

CO 

On 
On 



OO 
uo 

»o 
On 
OO 



CN 

On 
oo 



uo 
lo 

ON 



ro 



oo 
uo 
oo 



VO 
oo 

VO 

CO 

On 



VO 



as 



CN 
On 
VO 

uo 
On 
OO 



CN 
CN 
CN 
UO 

as 
oo 



VO 
On 



CN 



OO 

On 



On 
O 
CO 
as 



oo 

VO 
CO 
On 



OO 
CN 
VO 
OS 

oo 



o 

CO 

wo 
as 
oo 



wo 

C3S 



O 
CO 
CO 
VO 

C5N 
OO 



CN 

ON 



o 

VO 

<n 

CN 
ON 



OO 

o 

CO 

OS 



oo 
VO 
OS 
oo 



o 

CO 
CO 

VO 
OS 

oo 



oo 
uo 

C?N 



wo 
OS 

oo 

VO 
ON 

oo 



CO 

CO 



CN 
uo 
On 



ON 

CN 
C3S 



VO 

o 
oo 

ON 

oo 



VO 
oo 
oo 
VO 
OS 
oo 



OS 
wo 

ON 



CO 

o 
oo 

OS 
oo 



CN 



VO 
C3S 



CO 

VO 

o 

ON 



wo 
wo 

On 



5 

oo 

ON 

oo 



CO 

o 
oo 

as 
oo 



o 

VO 
C?S 



CO 

CO 
oo 
oo 

OS 

oo 



CO 



CO 



o 

OS 



oo 

wo 
o 

OS 



wo 
wo 
CN 
On 

C?N 

oo 



o 

CO 

oo 
oo 

ON 

oo 



VO 
OS 



oo 

CO 

OS 
C3S 

oo 



On 
wo 
o 



o 
o 

<M 
OS 
00 



ON 

VO 

o 
o 

C3S 



oo 

o 
o 

On 



On 

o 

CO 
OS 
OS 

oo 



CN 
VO 
C3S 



CO 

oo 

o 
o 

C3S 



VO 
CO 



OS 

oo 
oo 



o 

CN 
C3S 

oo 



oo 
oo 
o 
o 

OS 



o 

C3S 



CO 

VO 
On 



CN 



o 

OS 



CN 



wo 

OS 



oo 
wo 
VO 

oo 



CO 



oo 
oo 



o 

CN 

o 

OS 



wo 
o 

CN 
^-^ 

o 

C3S 



VO 
OS 



337 



KJ002 



Id 



P 



C/5 

f 



U 
I 

a 



s 
e 



CO 

O 



I 

o .S 
p o 



m 



S 

On 



CO 

m 

ON 

O 
ON 



CO 



oo 



o 

CN 
O 
ON 



cn 
oo 

o 

On 



VO 
ON 



CN 

cn 



cn 

VO 

CO 

O 

ON 



ON 

(N 

o 

ON 



OO 

VO 



o 



VO 

oo 



CN 
CO 
VO 
VO 

oo 



CO 

CN 
CO 

o 

OS 



VO 
CN 

o 

ON 



VO 
VO 

ON 



VO 
ON 



CN 
CO 

o 



CO 

oo 

CN 

CO 

o 

ON 



CN 



WO 

oo 

VO 



On 
oo 



VO 

oo 



CO 

o 

ON 



CN 
CO 

o 

On 



VO 
ON 



CN 



CO 

o 
o 

ON 



VO 

o 

ON 



CO 

ON 
CO 

oo 



C3N 
OO 



CN 
VO 

O 
ON 



CO 

o 

C3N 



OO 
VO 
ON 



T3 



-T3 



On 

CO 



CN 
CO 
CO 

VO 

o 

ON 



o 
o 

ON 



CO 

o 

VO 

o 



o 

CO 

oo 



oo 

CO 

oo 



CO 
CO 

VO 
o 

ON 



o 
o 

ON 



ON 
VO 
On 



I 



.2 

C/3 



< 

CN 
CO 



OO 

VO 

o 

ON 



VO 

VO 
VO 
O 
ON 



CO 



uo 
oo 



o 
oo 

CN 
CN 
OO 



o 

CO 

oo 



oo 

ON 

o 
o 

ON 



CO 
CO 

VO 
O 

ON 



o 

ON 



C/5 

2 



c 

O 

1-4 

o 



"3 '55 



oo 

CO 



VO 

o 
oo 
o 

ON 



ON 

oo 
o 

o 

ON 



to 
CO 



CN 
CO 

oo 



ON 
OO 
CN 
CS 
OO 



VO 
VO 

o 
oo 
o 

C3N 



C?N 
OO 
O 

o 

ON 



ON 



Q 



I 



00^ 



CO 

oo 
VO 



On 
O 
ON 



oo 
o 

ON 



CO 



ON 

On 



in 

CN 



oo 



VO 

ON 

o 

ON 



CN 

oo 

o 

ON 



CN 
C3N 



O 



CN 
OO 
ON 
ON 
O 
ON 



VO 
CN 

ON 

o 

ON 



CN 
VO 

oo 
VO 



VO 
CO 
On 



VO 

oo 

ON 



o 
o 



ON 



to 

C3N 
O 
ON 



CO 

ON 



CO 

2 

P 



O 

CO 

a. 
o 

-a 

CO 

O 



CO 



in 

CN 

o 

ON 



CO 

o 
o 

ON 



CO 



VO 
oo 



uo 

CO 
ON 



CN 

o 

ON 



CO 
CN 

o 
o 

ON 



ON 



o 



O 

H 
< 



oo 

CN 



to 

CN 
ON 



ON 



O 
CN 



ON 



CN 

oo 

VO 



CO 
oo 

VO 



oo 

VO 

CO 



VO 

to 
CN 

ON 



o 

CN 



ON 



to 

ON 



CO^ 

S 



Pi 
a 



CO 



oo 
CO 



o 
o 

ON 

CN 

On 



to 

VO 

c^ 

ON 



VO 

CO 



VO 
VO 



CN 
VO 



CNJ 

o 

CN 
ON 



to 

VO 

cs 

OS 



VO 

ON 




i 



o 

CN 



to 
to 

CO 
C3N 



to 

CO 

o 
CO 

On 



CO 
CO 



OS 

oo 
to 



to 

VO 



C3N 
OO 

to 

CO 

OS 



CN 
C7N 
CN 

On 



ON 



338 



KJ002 



Ribosomal protein L22 


Transcriptional regulators 


SAM-dependent 
methyltransferases related to 
tRNA 

(uracil-5-)-methyltransferase 


Transcriptional regulators 


ATPases involved in chromosome 
partitioning 


Orotate phosphoribosyltransferase 


Predicted metal-dependent 
membrane protease 


ATP-dependent DNA ligase 


Predicted archaeal kinases of the 
sugar kinase superfamily 


CD 

t: 

o 

1 

+ 

% 
2 


Uracil-DNA glycosylase 


Predicted transcriptional 
regulators 


Predicted Fe-S oxidoreductases 






I—* 




Q 






h-) 










a; 


> 
Pi 


Lrp 


TrmA 


MarR 


1 


PyrE 


1 


CDC9 


1 


NhaC 


1 


SpoOJ 


i 


CO 
CO 


CM 
<N 


5^ 
«o 


oo 


O 


tN 
CS 


CN 
CO 


o 
oo 
oo 


ro 


o 
o 


O 
CO 


CO 


1040 


1913922 


1914810 


1916193 


1916402 


1917262 


1917847 


1918401 


1920385 


1921329 


1923051 


1923968 


1924255 


1926233 


1913595 


1914387 


1914954 


1916282 


1916572 


1917334 


1918230 


1918711 


1920429 


1921407 


1923425 


1924060 


1924478 




r-2 


r-2 


f-2 










r-2 


r-2 


CO 
1 

l-t 


1 


f-2 


1494 


1493 


1492 


OO 
OO 

vo 


CO 


1132 


1061 


1131 


1491 


1490 


1856 


1130 


as 

OO 
VO 


175328 


174566 


173174 


172899 


172027 


171499 


170669 


168988 


168047 


166313 


165408 


165061 


163128 


175906 


174991 


174496 


173126 


172857 


172068 


171163 


170685 


168949 


167971 


166001 


165411 


164900 


1914050 


1914812 


1916204 


1916479 


1917351 


1917879 


1918709 


1920390 


1921331 


1923065 


1923970 


1924317 


1926250 


1913472 


1914387 


1914882 


1916252 


1916521 


1917310 


1918215 


1918693 


1920429 


1921407 


1923377 


1923967 


1924478 


1978 


1979 


1980 

1 


1981 


1982 


1983 


1984 


1985 


1986 


1987 


1988 


1989 


1990 



339 KJ002 



Transcriptional regulator 


Methyl-accepting chemotaxis 
protein 


Lysyl-tRNA synthetase class II 


Putative effector of murein 
hydrolase LrgA 


Putative effector of murein 
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Uncharacterized ArCR 


Uncharacterized ACR 
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Uncharacterized ATPases of the 
PP-loop superfamily 


Uncharacterized membrane 
protein 


Aspartate carbamoyltransferase 
regulatory subunit 


Aspartate carbamoyltransferase 


Uncharacterized ACR 
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Predicted metal-binding domain 
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In Table 2, f-1 through f-3, as described as reading frames^ 
refers to open reading frames in the sense strand, and r-1 
through r-3 refers to open reading frames in the antisense 
strand. In the classification, J refers to polypeptides 
5 relating to translation, ribosome structure or biological 
development; K refers to polypeptides relating to 
transcription; L refers to polypeptides relating to DNA 
replication, recombination or repair; D refers to 
polypeptides relating to chromosomal fractionation; O 

10 refers to polypeptides relating to post-translational 
events , protein metabolism turnover or chaperone proteins; 
M refers to polypeptides relating to cellular envelope 
biological development or outer membranes; N refers to 
polypeptides relating to cellular movement or secretion; 

15 P refers to polypeptides relating to inorganic ion 
transportation or metabolism; T refers to polypeptides 
relating to signaling mechanisms; C refers to polypeptides 
relating to energy production and conversion; G refers to 
polypeptides relating to carbohydrate transportation and 

20 metabolism; E refers to polypeptides relating to amino acid 
transportation and metabolism; F refers to polypeptides 
relating to nucleotide transportation and metabolism; H 
refers to polypeptides relating to coenzyme metabolism; I 
refers to polypeptides relating to lipid metabolism; Q 

25 refers to polypeptides relating to secondary metabolites 
biosynthesis, transportation or catabolism; R refers to 
polypeptides predicted to have general function; and S 
refers to polypeptides with an unknown function. 
Classification is interim, and two or more classifications 

30 may be appropriate, and in such cases, both letters are 
described therein. 



(Biomolecule chip) 
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In another aspect, the present invention provides a 
biomolecule chip. The present biomolecule chip comprises 
a substrate and at least one nucleic acid molecule having 
at least eight contiguous or non-contiguous nucleotide 
5 sequences of the sequence set forth in SEQ ID NOs: 1, or 
1087, or a variant thereof located therein. 

Accordingly, in one embodiment, the present invention 
provides a nucleic acid molecule comprising a) a sequence 

10 set forth in SEQ ID NO: 1 or 1087, or a complementary sequence 
or fragment thereof; (b) a polynucleotide encoding a 
polypeptide consisting of an amino acid sequence selected 
from the group consisting of SEQ ID NO: 2-341, 343-722, 
724-1086, 1088-1468, 1470-1837 and 1839-2157, or a fragment 

15 thereof; (c) a polynucleotide encoding a polypeptide having 
an amino acid sequence selected from the group consisting 
of SEQ ID NO: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 
and 1839-2157, or a variant thereof having at least one 
mutation selected from the group consisting of one or more 

20 amino acid substitutions, additions, and deletions, wherein 
the variant polypeptide has biological activity; (d) a 
polynucleotide capable of hybridizing to a polynucleotide 
of any of (a) - (c) , and encoding a polypeptide having an amino 
acid sequence having at least 70% identity to any one of 

25 the polypeptides of (a) to (c) , wherein the polypeptide has 
biological activity. 

In one preferred embodiment, the number of 
substitutions, additions and deletions described in (c) 
30 above may be limited to, for example, preferably 50 or less, 
40 or less, 30 or less, 20 or less, 15 or less, 10 or less, 
9 or less, 8 or less, 7 or less, 6 or less, 5 or less, 4 
or less, 3 or less, or 2 or less. The number of substitutions. 
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additions and deletions is preferably small, but may be large 
as long as the biological activity is maintained (preferably, 
the activity is similar to or substantially the same as that 
as set forth in Table 2, or an abnormal activity thereof 
5 (for example, inhibition of normal biological activity) . 

In other preferable embodiments, the biological 
activities possessed by the polypeptides of the present 
invention include, but are not limited to, for example, 

10 interactions with specific antibodies against at. least one 
polypeptide selected from the group consisting of SEQ ID 
NOs: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157; a biological activity listed in Table 2, and the 
like. These may be measured by, for example, immunological 

15 assays, labeling assays and the like. 

In other preferable embodiments, allelic gene 
variants as described in (d) above, advantageously have at 
least 99 % homology to the nucleic acid sequences set forth 

20 in SEQ ID NO: 1 or 1087, or a portion thereof (for example, 
when the reading frame of Table 2 is f-1, f-2 or f-3, the 
nucleic acid molecule has a sequence from the position of 
nucleic acid number (sense strand, start) of SEQ ID NO: 1 
of Table 2, to the position of nucleic acid number (sense 

25 strand, stop), or when the reading frame of Table 2 is r-1, 
r-2 or r-3, the nucleic acid molecule has a sequence from 
the position of nucleic acid number (antisense strand, 
start) of SEQ ID NO: 1087 of Table 2, to the position of 
nucleic acid number (antisense strand, stop) ) . 

30 

If a gene sequence database for the subject species 
is available, the above-mentioned species homologs may be 
identified by searching against the database using a gene 
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sequence of the present invention as a query sequence. 
Alternatively^ a nucleic acid sequence of the present 
invention, or a portion thereof (for example, when the 
reading frame of Table 2 is f-1, f-2 or f-3, the nucleic 
5 acid molecule has a sequence from the position of nucleic 
acid number (sense strand, start) of SEQ ID NO: 1 of Table 
2f to the position of nucleic acid number (sense strand, 
stop) , or when the reading frame of Table 2 is r-1, r-2 or 
r-3, the nucleic acid molecule has a sequence from the 

10 position of nucleic acid number (antisense strand, start) 
of SEQ ID NO: 1087 of Table 2, to the position of nucleic 
acid number (antisense strand, stop) ) may be used as a probe 
or primer to screen a genetic library of the subject species 
for identification thereof. Such identification methods 

15 are well known in the art, and are also described in 
references cited herein. Species homologs have preferably 
at least 30 % homology to a nucleic acid sequence set forth 
in SEQ ID NO: 1 or 1087, or a portion thereof (for example, 
when the reading frame of Table 2 is f-1, f-2 or f-3, the 

20 nucleic acid molecule has a sequence from the position of 
nucleic acid number (sense strand, start) of SEQ ID NO: 1 
of Table 2, to the position of nucleic acid number (sense 
strand, stop) , or when the reading frame of Table 2 is r-1, 
r-2 or r~3, the nucleic acid molecule has a sequence from 

25 the position of nucleic acid number (antisense strand, 
start) of SEQ ID NO: 1087 of Table 2, to the position of 
nucleic acid number (antisense strand, stop) ) . Preferably, 
the species homologs of the present invention may have at 
least about 40 % homology, at least about 50 % homology, 

30 at least about 60 % homology, at least about 70 % homology, 
at least about 80 % homology, at least about 90 % homology, 
at least about 95 % homology, at least about 98 % homology 
with the above-mentioned standard sequence. 
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In preferable embodiments^ identity against at least 
one polynucleotide of the above (a) - (e) or the complementary 
sequence thereto^ maybe at least about 80 more preferably 
5 at least 90 still more preferably at least about 98 %,most 
preferably at least about 99 %. Most preferably, the gene 
sequence of the present invention, has a sequence 100 % 
identical to a nucleic acid sequence set forth in SEQ ID 
NO: 1 or 1087, or a portion thereof (for example, when the 

10 reading frame of Table 2 is f-1, f-2 or f-3, the nucleic 
acid molecule has a sequence from the position of nucleic 
acid number (sense strand, start) of SEQ ID NO: 1 of Table 
2, to the position of nucleic acid number (sense strand, 
stop) , or when the reading frame of Table 2 is r~l, r-2 or 

15 r-3, the nucleic acid molecule has a a sequence from the 
position of nucleic acid number (antisense strand, start) 
of SEQ ID NO: 1087 of Table 2, to the position of nucleic 
acid number (antisense strand, stop)). 

20 In a preferred embodiment, the nucleic acid molecule 

of the present invention encoding the gene of the present 
invention may have a length of at least 8 contiguous 
nucleotides. The appropriate nucleotide length of the 
nucleic acid molecule of the present invention may vary 

25 depending on the purpose of use of the present invention. 
More preferably, the nucleic acid molecule of the present 
invention may have a length of at least 10 contiguous 
nucleotides, even more preferably at least 15 contiguous 
nucleotides, still even more preferably at least 20 

30 contiguous nucleotides, and yet still even more preferably 
at least 30 contiguous or non-contiguous nucleotides. 
These lower limits of the nucleotide length may be present 
between the above-specified numbers (e.g., 9, 11, 12, 13, 
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14, 16, and the like) or above the above-specified numbers 
(e.g., 21, 22, ... 30, and the like) . The upper limit of the 
length of the polypeptide of the present invention may be 
greater than or equal to the full length of the sequence 
5 as set forth in SEQ ID NO. 1, as long as the polynucleotide 
can be used for the intended purpose (e.g. antisense, RNAi, 
marker, primer, probe, capable of interacting with a given 
agent) . Alternatively, when the nucleic acid molecule of 
the present invention is used as a primer, the nucleic acid 
10 molecule typically may have a nucleotide length of at least 
about 8, preferably a nucleotide length of about 10. When 
used as a probe, the nucleic acid molecule typically may 
have a nucleotide length of at least about 15, and preferably 
a nucleotide length about 17. 

15 

In one embodiment, the nucleic acid molecule encoding 
the gene of the present invention comprises the entire range 
of the open reading frame of SEQ ID NO: 1. More preferably, 
the nucleic acid molecule of the present invention consists 

20 of at least one sequence set forth in SEQ ID NO: 1 or 1087, 
or a portion thereof (for example, when the reading frame 
of Table 2 is f-l, f-2 or f-3, the nucleic acid molecule 
has a sequence from the position of nucleic acid number 
(sense strand, start) of SEQ ID NO: 1 of Table 2, to the 

25 position of nucleic acid number (sense strand, stop) , or 
when the reading frame of Table 2 is r-1, r-2 or r-3, the 
nucleic acid molecule has a sequence from the position of 
nucleic acid number (antisense strand, start) of SEQ ID NO: 
1087 of Table 2, to the position of nucleic acid number 

30 (antisense strand, stop) ) . 

Accordingly, the biomolecule chip of the present 
invention preferably uses nucleic acid molecules or 
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variants thereof which encompass the sequence set forth in 
SEQ ID NO: 1 or 1087. By using nucleic acid molecules of 
such an encompassing nature^ it is possible to analyze 
functions of the genome in an exhaustive manner. This was 
5 first made possible by reading the entire sequence of the 
genome, and thus has not been attained by prior art 
technologies, and thus should present significant effects. 

In other embodiments, the nucleic acid molecules, or 
10 variants thereof of the present invention, to be used in 
the biomolecule chip, comprise any open reading frame, as 
set forth in SEQ ID NO: 1 or 1087. As such, the effect by 
which any open reading frame can be selected on the genome, 
should be recognized as significant as this has not been 
15 possible using prior art technology. In particular, it 
should be noted that analysis of the entire genome of an 
organism living in high temperature environments, such as 
at 90 ^"0, is possible. 

20 In another embodiment, the nucleic acid molecule or 

variants thereof, to be used in the biomolecule chip of the 
present invention, preferably comprise substantially all 
the open reading frames set forth in SEQ ID NO: 1 or 1087. 
As used herein the term "substantially all" refers to a 

25 number sufficient for global genomic needs. Accordingly, 
the term "substantially all" is not necessarily all, and 
depending on the purpose of interest, those skilled in the 
art may select an appropriate number therefor. Exemplary 
"substantially all" includes, but is not limited to, for 

30 example, at least about 30 %, preferably at least about 40 %, 
more preferably at least about 50 %, still preferably at 
least about 80 %, still more preferably at least about 90 %, 
yet more preferably at least about 95 %, at least about 96 %, 
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at least about 97 %^ at least about 98 %^ at least about 
99 %, and the like, of the total number of entire open reading 
frames. In other typical examples of the present invention^ 
substantially all may be about 900 genes whose function has 
5 already been identified in the present application. The 
effect by which analysis of substantially all the open 
reading frame is allowed, is not attainable using prior art 
technologies . 

Accordingly, in another preferable embodiment, the 
nucleic acid molecule or variants thereof, to be used in 
the biomolecule chip of the present invention, comprises 
a sequence encoding at least one sequence selected from the 
group consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157. 

In other preferable embodiments, the nucleic acid 
molecules or variants thereof comprise substantially all 
sequences encoding sequences selected from the group 
20 consisting of SEQ ID NOs: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157. 



10 



15 



In more preferable embodiments, the nucleic acid 
molecule or the variant thereof, to be used as the 

25 biomolecule of the present invention, comprises at least 
an eight contiguous nucleotide length of substantially all 
the sequences encoding sequences selected from the group 
consisting of SEQ ID NOs: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157. As used herein the 

30 selection of the sequence may be determined in consideration 
of a variety of factors as described above. 
A nulciec acid molecule at least eight contiguous 
nucleotides in length may comprise a sequence unique to the 
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hyperthermophillic archeabacteria^ and thus is 
advantageous for such analyses. 

In another preferable embodiment, the nucleic acid 
5 molecule or the variant thereof to be used as the biomolecule 
of the present invention, comprises at least a fifteen 
contiguous nucleotide length of substantially all the 
sequences encoding sequences selected from the group 
consisting of SEQ ID NOs : 2-341, 343-722, 724-1086, 
10 1088-1468, 1470-1837 and 1839-2157. A nucleic acid 
molecule at least fifteen nucleotides in length allows 
substantially specific identification of sequences unique 
to the hyperthermophillic archeabacteria, and thus is 
advantageous for such analyses. 

15 

In another more preferable embodiment, the nucleic 
acid molecule or the variant thereof, to be used in the 
biomolecule chip of the present invention, comprises at 
least a thirty contiguous or non-contiguous nucleotide 

20 length of substantially all the sequences encoding 
sequences selected from the group consisting of SEQ ID NOs: 
2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157. A nucleic acid molecule at least thirty 
contiguous or non-contiguous nucleotides in length allows 

25 substanitally specific identification of sequences unique 
to the hyperthermophillic archeabacteria, even when used 
as a probe, and thus is advantageous for such analyses. 

In another more preferable embodiment, the nucleic 
30 acid molecule or the variant thereof to be used in the 
biomolecule chip of the present invention, comprises 
substantially all the sequences encoding sequences selected 
from the group consisting of SEQ ID NOs: 2-341, 343-722, 
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724-1086, 1088-1468, 1470-1837 and 1839-2157, or sequences 
with one or more amino acid substitution, addition and/or 
deletion thereto. Such sequences allow exhaustive analyses 
of nucleic acid molecules encoding polypeptides included 
5 or suspected to be included in an archeabacteria, and thus 
are advantageous for such analyses - 

In another more preferable embodiment, the nucleic 
acid molecule or the variant thereof to be used in the 

10 biomolecule chip of the present invention, comprises at 
least an eight contiguous nucleotide length of 
substantially all the sequences encoding sequences selected 
from the group consisting of SEQ ID NOs : 2-341, 343-722, 
724-1086, 1088-1468, 1470-1837 and 1839-2157, or sequences 

15 with one or more amino acid substitution, addition and/or 
deletion thereto. Chips containing such sequences may be 
used for analysis of the behavior of all genes. 

In another more preferable embodiment, the nucleic 
20 acid molecule or the variant thereof to be used in the 
biomolecule chip of the present invention, comprises a 
molecule where the reading frame of Table 2 is f-1, f-2 or 
f-3, has a sequence from the position of nucleic acid number 
(sense strand, start) of SEQ ID NO: 1 of Table 2, to the 
25 position of nucleic acid number (sense strand, stop) or a 
sequence having at least 70 % homology thereto, or when the 
reading frame of Table 2 is r-1, r-2 or r-3, the nucleic 
acid molecule has a sequence from the position of nucleic 
acid number (antisense strand, start) of SEQ ID NO; 1087 
30 of Table 2, to the position of nucleic acid number (antisense 
strand, stop) or a sequence having at least 70 % homology 
thereto. Such sequences contain open reading frames 
actually possessed by hyperthermophillic archeabacteria 
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and thus provide an accurate assay at the genomic level. 
Thus^ the present embodiment may be used for global analysis 
at such a genomic level. 

5 In another embodiment, the substrate comprising the 

biomolecule of the present inventin is addressable. Giving 
addresses facilitates the analyses of all of the nucleic 
acid molecules. Methods for addressing are well known in 
the art. 

10 

In another aspect, the present invention provides a 
biomolecule chip with a polypeptide or a variant thereof, 
having at least an amino acid sequence selected from the 
group consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 
15 1088-1468, 1470-1837 and 1839-2157, or a sequence having 
at least 70 % homology thereto, located therein. 

Accordingly, in one embodiment, the present invention 
provides a polypeptide of (a) a polypeptide consisting of 

20 an amino acid sequence selected from the group consisting 
of SEQ ID NO: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 
and 1839-2157, or a fragment thereof; (b) a polypeptide 
having an amino acid sequence selected from the group 
consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 1088-1468, 

25 1470-1837 and 1839-2157, or a variant thereof having at least 
one mutation selected from the group consisting of one or 
more amino acid substitutions, additions, and deletions, 
wherein the variant polypeptide has a biological activity; 
(c) a polypeptide encoded by a sequence or splicing variants 

30 or allelic variants thereof, wherein the nucleic acid 
molecule or the variant thereof, when the reading frame of 
Table 2 is f-1, f-2 or f-3, has a sequence from the position 
of nucleic acid number (sense strand, start) of SEQ ID NO: 
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1 of Table 2, to the position of nucleic acid number (sense 
strand, stop) , or when the reading frame of Table 2 is r-1, 
r-2 or r-3y the nucleic acid molecule has a sequence from 
the position of nucleic acid number (antisense strand, 
5 start) of SEQ ID NO: 1087 of Table 2, to the position of 
nucleic acid number (antisense strand, stop) ; (d) a 
polypeptide of at least one species homolog of an amino acid 
sequence selected from the group consisting of SEQ ID NO: 
2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
10 1839-2157; or (e) a polypeptide having an amino acid 
sequence having at least 70% identity to any one of the 
polypeptides of (a) to (c) , wherein the polypeptide has 
biological activity. 

15 In one preferred embodiment, the number of 

substitutions, additions and deletions described in (b) 
above may be limited to, for example, preferably 50 or less, 
40 or less, 30 or less, 20 or less, 15 or less, 10 or less, 
9 or less, 8 or less, 7 or less, 6 or less, 5 or less, 4 

20 or less, 3 or less, or 2 or less. The number of substitutions, 
additions and deletions is preferably small, but may be large 
as long as biological activity, is maintained (preferably, 
the activity is similar to or substantially the same as that 
of the biological activity of a normal genetic type of a 

25 polypeptide having an amino acid sequence selected from the 
group consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 18 39-2157 , or an abnormal activity 
of a polypeptide having an amino acid sequence selected from 
the group consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 

30 1088-1468, 1470-1837 and 1839-2157). 

In another preferred embodiment, the above-described 
splicing or allelic variants of the polypeptides described 
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in (c) above preferably have at least about 99% homology 
to a polypeptide having an amino acid sequence selected from 
the group consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157. 

5 

In another preferable embodiment, the 
above-mentioned species homologs preferably have at least 
about 30 % homology to a polypeptide having an amino acid 
sequence selected from the group consisting of SEQ ID NO: 

10 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157. Preferably, the species homologs have homology 
to the above standard sequence with at least about 40 % 
homology, at least about 50 % homology, at least about 60 % 
homology, at least about 70 % homology, at least about 80 % 

15 homology, at least about 90 % homology, at least about 95 % 
homology, at least about 98 % homology. 

When a genetic sequence database of the species exists, 
the above species homologs may be identified by performing 

20 a search against the database using a polypeptide having 
an amino acid sequence selected from the group consisting 
of SEQIDNO: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 
and 1839-2157, as a query sequence. Alternatively, the 
entire amino acid sequence of a polypeptide having an amino 

25 acid sequence selected from the group consisting of SEQ ID 
NO: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157, or a portion thereof, may be used as a probe or 
primer for screening a genetic library of the species. Such 
methods for identification are well known in the art, and 

30 are described in the references cited herein. Species 
homologs have preferably at least about 30 % homology when 
the reading frame of Table 2 is f-1, f-2 or f-3, a sequence 
from the position of nucleic acid number (sense strand. 
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start) of SEQ ID NO: 1 of Table 2, to the position of nucleic 
acid number (sense strand, stop) , or when the reading frame 
of Table 2 is r-1, r-2 or r-3, a sequence from the position 
of nucleic acid number (antisense strand, start) of SEQ ID 
5 NO: 1087 of Table 2, to the position of nucleic acid number 
(antisense strand, stop) ; or an amino acid sequence selected 
from the group consisting of SEQ ID NO: 2-341, 343-722, 
724-1086, 1088-1468, 1470-1837 and 1839-2157 . Preferably, 
the species homologs may have homology to the above standard 
10 sequence with at least about 40 % homology, at least about 
50 % homology, at least about 60 % homology, at least about 
70 % homology, at least about 80 % homology, at least about 
90 % homology, at least about 95 % homology, at least about 
98 % homology. 

15 

In another preferable embodiment, the biological 
activity possessed by the variant polypeptide in (e) above, 
includes, but is not limited to, for example, interaction 
with an antibody specific to the polypeptide having an amino 

20 acid sequence selected from the group consisting of SEQ ID 
NO: 2-341, . 343-722, 724-1086, 1088-1468, 1470-1837 and 
1839-2157, or a fragment thereof; an enzymatic function as 
described in Table 2; and the like. Such functions may be 
measured by enzymatic assays, immunological assays, 

25 fluorescence assays and the like. 

In preferable embodiments, the above-described 
homology to any one of the polypeptides described in (a) 
to (d) above may be at least about 80%, more preferably at 
30 least about 90%, even more preferably at least about 98%, 
and most preferably at least about 99%. Most preferably, 
the genetic product of the present invention is a sequence 
consisting of at least one amino acid sequence selected from 
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the group consisting of SEQ ID NO: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157. 

The polypeptide of the present invention 
5 typically has a sequence of at least 3 contiguous amino acids. 
The amino acid length of the polypeptide of the present 
invention may be short as long as the peptide is suitable 
for an intended application, but preferably a longer 
sequence may be used. Therefore, the amino acid length may 

10 be preferably at least 4, more preferably at least 5, at 
least 6, at least 7, at least 8, at least 9 and at least 
10, even more preferably at least 15, and still even more 
preferably at least 20. These lower limits of the amino acid 
length may be present between the above-specified numbers 

15 (e.g., 11, 12, 13, 14, 16, and the like) or above the 
above-specif ied numbers (e.g., 21, 22, 30, and the like) . 
The upper limit of the length of the polypeptide of the 
present invention may be greater than or equal to the full 
length of the sequence as set forth in amino acid sequence 

20 selected from the group consisting of SEQ ID NO: 2-341, 
343-722, 724-1086, 1088-1468, 1470-1837 and 1839-2157 as 
long as the peptide is capable of interacting with a given 
agent. As used herein, more preferable forms and 
constitutions with respect to the sequence to be included, 

25 may take any embodiment described herein above for 
preferable forms and constitutions. 

The genetic product of the polypeptide form of the 
present invention is preferably labeled or may be capable 
30 of being labeled. Such a genetic product which is labeled 
or may be capable of being labeled, may be used to measure 
the antibody levels against the genetic product, thereby 
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allowing indirect measurement of the level of expression 
of the genetic product. 

In another preferable embodiment^ the polypeptide or 
5 the variant thereof to be located on to a support of the 
biomolecule chip of the present invention has a length of 
at least three contiguous amino acids of an amino acid 
sequence selected from the group consisting of SEQ ID NO: 
2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 
10 1839-2157, or a sequence having at least 70 % homology 
thereto. By having a sequence of at least three contiguous 
three amino acids, it is possible to constitute a specific 
epitope. As used herein, preferable forms of the sequence 
to be used, takes any form described herein above. 

15 

In preferable embodiments, the polypeptide or the 
variant thereof to be located on a support of the biomolecule 
chip of the present invention, has a length of at least eight 
contiguous amino acids of an amino acid sequence selected 

20 from the group consisting of SEQ ID NO: 2-341, 343-722, 
724-1086, 1088-1468, 1470-1837 and 1839-2157, or a sequence 
having at least 70 % homology thereto. By having a sequence 
of at least eight contiguous amino acids, it is possible 
to constitute specific epitopes in a more efficient manner. 

25 As used herein, preferable forms and constitutions of the 
sequence to be used, takes any form described herein above. 

In preferable embodiments, the polypeptide or the 
variant thereof to be located on a support of the biomolecule 
30 chip of the present invention, has a length of at least three 
contiguous or non-contiguous amino acids of an amino acid 
sequence selected from the group consisting of SEQ ID NO: 
2-341, 343-722, 724-1086, 1088-1468, 1470-1837 and 



369 



KJ002 



1839-2157, or a sequence having at least 70 % homology 
thereto, and having a biological function. As used herein, 
the biological activities preferably include a function 
described in Table 2. In another embodiment, the 
5 biological activity includes epitope activity. As used 
herein, preferable forms and constitutions relating to 
preferable sequences may have the advantage of any of the 
forms and constitutions described herein above. 

In another aspect, the present invention provides a 
storage medium having stored therein, information about a 
nucleic acid sequence of a nucleic acid molecule having a 
sequence of at least eight contiguous or non-contiguous 
nucleotides of the sequence set forth in SEQ ID NOs: 1 or 
1087, or a variant thereof. As used herein, the information 
about the nucleic acid sequence includes, in addition to 
information about the nucleic acid sequence per se, 
information relating to that set forth in a conventional 
sequence listing. Such additional information includes, 
but is not limited to, for example, coding region, intron 
region, specific expression, promoter sequence and activity, 
biological function, similar sequences, homologs, 
reference information, and the like. 

25 In a preferable embodiments, the nucleic acid molecule 

or the variant thereof to be stored in the storage medium 
of the present invention, comprises a sequence of at least 
eight contiguous nucleotides of substantially all the 
sequences encoding sequences selected from the group 

30 consisting of SEQ ID NOs: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157, or sequences with one 
or more amino acid substitution, addition and/or deletion 
thereto. Such information could not be provided by prior 
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art technologies, and thus should be recognized to be an 
effect attained for the first time by the present invention . 

In other embodiments, the reading frame of Table 2 is 
5 f-1/ f-2 or f-3, the nucleic acid molecule or the variant 
thereof to be recorded in the storage medium of the present 
invention, has a sequence from the position of nucleic acid 
number (sense strand, start) of SEQ ID NO: 1 of Table 2, 
to the position of nucleic acid number (sense strand, stop) 

10 or a sequence having at least 70 % homology thereto, or when 
the reading frame of Table 2 is r-1, r-2 or r-3, the nucleic 
acid molecule has a sequence from the position of nucleic 
acid number (antisense strand, start) of SEQ ID NO: 1087 
of Table 2, to the position of nucleic acid number (antisense 

15 strand, stop) or a sequence having at least 70 % homology 
thereto. Such storage medium with information recorded 
thereon has never been conventionally provided, and thus 
the storage medium of the present invention has an 
advantageous effect in allowing analysis of the entire 

20 genome. Preferably, the storage medium of the present 
invention includes information about substantially all the 
open reading frame sequences. As used herein, preferable 
forms and constitutions relating such preferable sequences 
may take advantages of any forms and constitutions described 

25 herein above . 

In another aspect, the present invention provides a 
storage medium, comprising information about a polypeptide 
or a variant thereof having at least an amino acid sequence 
30 selected from the group consisting of SEQ ID NO: 2-341, 
343-722, 724-1086, 1088-1468, 1470-1837 and 1839-2157, or 
a sequence having at least 70 % homology thereto, located 
therein. As used herein, preferable forms and 
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constitutions relating such preferable sequences may take 
advantage of any forms and constitutions described herein 
above . 

5 In another embodiment, the polypeptide or the variant 

thereof to be stored in the storage medium of the present, 
invention with respect to information thereabout, has a 
sequence of at least three contiguous amino acids of at least 
an amino acid sequence selected from the group consisting 

10 of SEQIDNO: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 
and 1839-2157, or a sequence having at least 70 % homology 
thereto . As used herein, the referable forms and 
constitutions of such preferable sequences may take 
advantage of any of the forms and constitutions described 

15 herein above. 

In another embodiment, the polypeptide or the variant 
thereof to be stored in the storage medium of the present 
invention with respect to information thereabout, has a 

20 sequence of at least eight contiguous amino acids of at least 
an amino acid sequence selected from the group consisting 
of SEQIDNO: 2-341, 343-722, 724-1086, 1088-1468, 1470-1837 
and 1839-2157, or a sequence having at least 70 % homology 
thereto. As used herein, the preferable forms and 

25 constitutions of such preferable sequences may take 
advantages of any of the forms and constitutions described 
herein above. 

In another embodiment, the polypeptide or the variant 
30 thereof to be stored in the storage medium of the present 
invention with respect to information thereabout, has a 
sequence of at least three contiguous or non-contiguous 
amino acids of an amino acid sequence selected from the group 
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consisting of SEQ ID NO: 2-341^ 343-722, 724-1086, 1088-1468, 
1470-1837 and 1839-2157, or a sequence having at least 70 % 
homology thereto, having biological function- As used 
herein, preferable forms and constitutions of such 
5 preferable sequences may take advantages of any of the forms 
and constitutions described herein above. 

In another embodiment, the biological activity to be 
included in the storage medium of the present invention with 
10 respect to information thereof, comprises a function set 
forth in Table 2. As used herein, preferable forms and 
constitutions of such preferable activities may take 
advantage of any forms and constitutions described herein 
above . 

15 

In another aspect, the present invention provides a 
biomolecule chip having at least one antibody against a 
polypeptide or a variant thereof, located on a substrate, 
the polypeptide or the variant thereof comprises at least 

20 one amino acid sequence of sequences selected from the group 
consisting of SEQ ID NOs: 2-341, 343-722, 724-1086, 
1088-1468, 1470-1837 and 1839-2157, or a sequence having 
at least 70 % homology thereto. As used herein, preferable 
forms and constitutions of preferable sequences may take 

25 advantage of any forms and constitutions described herein 
above . 

In another aspect, the present invention provides an 
RNAi molecule having a sequence homologous to a reading frame 
30 sequence wherein, when the reading frame of Table 2 is f-1, 
f-2 or f-3, the reading frame sequence has a sequence from 
the position of nucleic acid number (sense strand, start) 
of SEQ ID NO: 1 of Table 2, to the position of nucleic acid 
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number (sense strand^ stop) or a sequence having at least 
70 % homology thereto^ or when the reading frame of Table 
2 is r-1^ r-2 or r--3^ the reading frame sequence has a a 
sequence from the position of nucleic acid number (antisense 
5 strand, start) of SEQ ID NO: 1087 of Table 2, to the position 
of nucleic acid number (antisense strand, stop) or a sequence 
having at least 70 % homology thereto. As used herein, such 
an RNAi molecule may take any form described herein above 
in detail, and those skilled in the art may make and use 
10 any appropriate RNAi molecule once the sequence information 
of the present invention is given. 

In preferable embodiments, the RNAi molecule of the 
present invention is an RNA or a variant thereof comprising 
15 double-stranded portion of at least 10 nucleotide length. 

In a more preferable embodiment, the RNAi molecule 
comprises a 3' overhand. 

20 In another preferable embodiment, the above-3' 

overhang terminus has a DNA molecule of two or more 
nucleotides in length. 

In other preferable embodiments, the 3' overhang has 
25 a DNA molecule of 2-4 nucleotides. 

Such RNAi molecules may be used for suppressing 
particular functions of hyperthermophillic archeabacteria . 
Any RNAi molecules may be used which were not attainable 
30 by the prior art, and thus the present invention attains 
significant effects in this regard. 



All patents, patent applications, journal 
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articles and other references mentioned herein are 
incorporated by reference in their entirety. 

The present invention is heretofore described 
5 with reference to preferred embodiments to facilitate 
understanding of the present invention- Hereinafter, the 
present invention will be described by way of examples. 
Examples described below are provided for illustrative 
purposes only. Accordingly, the scope of the present 
10 invention is limited only by the appended claims. 

EXAMPLES 

Hereinafter, the present invention will be 
15 described in more detail by way of examples. Thus it should 
be understood that the present invention is not limited to 
the examples below. 

(EXAMPLE 1: Genomic sequencing) 

20 (Preparation of chromosomal DNA the KOD-1 strain) 

The KOD-1 strain was inoculated into 1000 ml of 0.5 
X 2216 Marine Broth mediiim as described in Appl. Environ. 
Microbiol. 6p_(12), 4559-4566 (1994) (2216 Marine Broth 
:i8.7g/L, PIPES 3.48g/L, CaCi2-H20 0.725g/L, 0.4 mL 0.2% 

25 resazurin, 475mL artificial sea water (NaCl 28.16 g/L, KCl 
0.7 g/L, MgCl2-6H20 5.5 g/L, MgS04'7H20 6.9 g/L), distilled 
water 500 mL, pH7.0) and cultured using 2 liter fermenter. 
During culture, nitrogen gas was introduced into the 
fermenter, and was maintained at an internal pressure of 

30 0.1 kg/cm^. Culture was maintained at the temperature of 
85 ± 1**C for fourteen hours. Further, the culture was 
carried out by static culture, and no aeration and agitation 
was performed with the nitrogen gas in the culture. After 
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culture, the bacteria (about 1,000 ml) were recovered by 
centrif ugation at 10,000 rpm for 10 minutes. 

One g of the resulting bacterial pellet was suspended 
5 in 10 ml of Solution A (50 mM Tris-HCl, 50mM EDTA, pH 8.0), 
and centrifuged (8, 000 rpm, 5 minutes, 4 °C) to pellet the 
bacteria and suspended in 3 ml of Solution A containing 15 
% sucrose, maintained the temperature at 37 °C for 30 minutes, 
and added 3 ml of Solution A containing 1 % N-lauryl sarcosine 

10 thereto- 5.4 g of cesium chloride and 300 pi of 10 mg/ml of 
ethidium bromide were added to the solution, and 
ultracentrif uged at 55,000 rpm, 16 hours, at 18 °C and 
chromosomal DNA was fractionated. The resultant 

chromosomal DNA fractions were subjected to n-butanol 

15 extraction to remove ethidium bromide, and dialyzed against 
TE solution (10 mM Tris-HCl (pH 8.0), 0 . 1 mM EDTA) to yield 
chromosomal DNA. 

(Screening/sequencing analysis of the chromosomal 
20 library) 

Determination of the genomic sequence was peformed 
according to the bottom-down approach, as generally 
performed in the art. In brief, the outline is as follows: 
first, isolated DNA was fragmented to clone into a cloning 

25 vector such as pUC. Next, cloned fragments were sequenced 
by shot-gun sequencing. These sequencing reactions were 
performed at about 15,000 per IMbp. The sequences 
determined for each reaction, were assembled for 
clarification in a group of sequences called "contig". 

30 Thereafter, gaps between the contigs (physical and sequence 
gaps) were cloned, and the gaps were sequenced to fill the 
gaps. Thereafter, the analysis of base sequence data was 
performed to identify open reading frame for performing 
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annotation. The details are as follows: 

Firsts genomic libraries were constructed. As used 
herein, in order to prevent bias derived from genetic 
5 sequences, physical digestions rather than partial 
digestion using restriction enzymes were performed. In 
this case, libraries of a plurality of lengths were 
constructed. Plasmid libaries containing 2-3 kbp fragments, 
and lambda phage libraries containing about 20 kbp were 
10 constructed. 

Second, shot gun sequencing of plasmid libraries was 
performed. A sequencer commercially available from Applied 
Biosystems was used for sequencing. As used herein, such 

15 sequencing was performed so that 4 00-500 bp base sequences 
may be obtained for about 150,000 /IMbp. Similarly, 
terminal shot gun sequencing of the lambda phage library 
was performed. As such, theoretically, it was calculated 
the entire full-length genome was sequenced six times or 

20 more. 

Third, base sequence data (about 40,000 pieces of data 
for about 2Mbp genome) was assembled to fill in the gaps. 
In this instance, terminal sequence data from the lambda 

25 phage library consisting of long fragments was determined 
for relative positions and the direction of each region. 
What is obtainedby this proceedure is usually called a 
"contig". In the present Example, a number of contigs were 
obtained. Sequence undetermined regions (gaps) 

30 therebetween were filled. When fragments were identified 
to fill the gap between contigs, such gaps are called 
sequence gaps, and gaps in which such fragments were not 
cloned, are called physical gaps. Filling such physical 
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gaps was performed by engineering techniques, such as 
amplification of LA-PCR and the like, and base sequence 
determination and the like. As such, substantially all the 
sequencing data fell within one contig, and the sequencing 
5 was thus completed. 

Fourth, the sequence data was analyzed. Open reading 
frames (ORF) were identified and the annotation thereof was 
performed. In this task, programs such as Hidden Markov 

10 model (HMM) and Interpolated Markov model (GLIMMER) and the 
like were used for identification of ORFs . Thereafter, the 
search functions of BLAST, BLASTX and FASTA and the like 
were used to identify the function of each ORF. Thereafter, 
genetic and biochemical analyses were performed (see, for 

15 example, Fraser CM., Res Microbiol., 151, 79-84 (2000); 
Fraser C.M.et al.. Nature, 406, 799-803 (2000); Nelson et 
al.,Nat Biotechnol., 18, 1049-1054 (2000); Kawarabayasi 
Y.etal., DNARes., 6, 83-101, 145-222 (1999) and the like). 

20 The nucleic acid sequences determined as above are 

sequences set forth in SEQ ID NO: 1 (SEQ ID NOs: 1, 342, 
and 723 are plus (sense) strand, and SEQ ID NOs: 1087, 1469 
and 1838 are minus (antisense) strand) . 

25 (Functional analysis of each gene) 

Next, the amino acid sequence of each gene was compared 
to those known in the art, as registered in databases such 
as EMBL, PDB and the like, by using software such as DNASIS, 
BLAST, and CLUSTAL W. As a result, a variety of polypeptides 

30 having high homology with said amino acid sequences were 
identified, and the function of each gene inferred therefrom 
(see Table 2) . 
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(EXAMPLE 2: targeting) 
(double cross-over disruption) 
(Bacterial strains and growth conditions) 
r. kodakaraensis KODl and derivatives thereof were 
5 cultured under stringent anaerobic conditions at 85 °C in 
rich growth medium (ASW-YT) and amino acid-containing 
synthetic medium (ASW-AA) . ASW-YT medium contains 5.0 g/L 
yeast extract, 5.0 g/L trypton and 0.2 g/L sulfur (pH 6.6) 
in a diluted artificial sea water to 1.25 fold (ASW x 0.8) . 

10 The composition of ASW is as follows: NaCl 20g ; MgCl2 ■ 6H2O 
3g;MgS04- 7H2O 6g; (NH4) 2SO4 lg;NaHC03 0 . 2g; CaClz • 2H2O 0 . 3g;KCl 
0.5g;NaBr 0 . 05g; SrClz ' 6H2O 0.02g; and Fe(NH4) citrate O.Olg. 
ASW-AA medium is 0.8 x ASW supplemented with 5.0 ml/L 
modified Wolfe minor mineral (containing in IL, 0.5g MnS04' 

15 2H2O;0.1g CoCl2;0.1g ZnS04; O.Olg CUSO4 • 5H2O; O.Olg 
AlK (304)2; O.Olg H3B03;and O.Olg NaMo04 • 2H2O) , 5.0ml/L 
vitamin mixture (see the following literature) , twenty 
amino acids (containing 250mg cystein • HCl; 75mg alanine; 
125mg arginine • HCl;100mg asparagine • H2O;50mg aspartic 

20 acid;50mg glutamine, 200mg glutamic acid;200mg glycine; 
lOOmg histidine • HCl • H2O;100mg isoleucine; lOOmg 
leucine; lOOmg lysine • HCl;75mg methionine; 7 5mg 
phenylalanine; 125mg proline; 75mg serine; lOOmg threonine; 
75mg tryptophane; lOOmg tyrosine; and 50mg valine in IL) 

25 and 0.2g/L sulfur element (pH is adjusted to 6.9 with NaOH) 
(Robb, F.T. ,and A. R . Place . 1 995 . Media for 

Thermophiles, p. 167-168 . Jii F.T.Robb and A. R. Place (ed. ) 
Archea: a laboratory manual-Thermophiles . Cold Spring 
Harbor Press, Cold Spring Harbor , N . Y .) . Optionally, 5-FOA 

30 (Wako Pure Chemical, Osaka, Japan) and uracil (Kojin, Tokyo, 
Japan) were added to ASW-AA medium at the concentrations 
described in Robb. In order to examine tryptophan nutrient 
requirement, tryptophan-f ree ASW-AA, ASW-AAW" were used. 
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In order to reduce dissolved oxygen in the medium, 5.0%Na2S* 
9H2O was added until the color of sodium resazurin salt 
(l.Omg/L) disappeared. In the case of plate culture, 
1.0% (w/v) Gelrite (Wako Pure Chemical) was added, and in 
lieu of the sulphur element 5 . 0%Na2S* 9H2O solution, 2.0ml/L 
polysulfide solution (10gNa2S- 9H2O and 3.0g sulphur element 
/15ml) weas used for solidification. The cells were 
incubated in anaerobic chamber (Tabai Espec, Osaka, Japan) , 
at 85 °C. 

DH5-alpha, an E. coli used for general DNA engineering, 
was routinely cultured on LB medium (Sambrook, J., and D. 
Russel. 2001. Molecular cloning: a laboratory manual, 3rd 
edn.Cold Spring Harbor Press, Cold Spring Harbor, N.Y.) 
which was supplemented with SOug/ ml "^^7^ as necessary. 

(Mutation by UV radiation and Isolation of 5-FOA 
resistant variants) 

T. kodakaraensis KODl was cultured in 2.0 L of ASW-AA 
20 liquid medium for 39 hours. Cells within the stationary 
phase were recovered by centrif ugation (6,000 x g, 30 
minutes) . The following procedures were performed 
anaerobically in an anaerobic chamber as follows: cells 
were resuspended in 60 mL of ASW, and a portion of the 
25 suspension (10 mL) was placed into a petri dish. The 
suspension was UV radiated for an appropriate time (0, 30, 
60, 90 and 120 seconds) at a distance of 20 cm from 15W 
sterilization lamp, with agaitation. Aliquots (200 pi), 
were plated on ASW-AA plate medium containing 0.75 % 5-FOA, 
30 and uracil nutrition requirement (Pyr~) variants were 
dominant ly screened. In order to support growth of the 
resultant variants, lOjjg/ml uracil was included in the 
growth media. The cells were incubated at 85 ^'C for five 
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days. The number of viable cells was deterimined by 
inoculation onto a ASW-AA plate medium free of 5-FOA at an 
appropriate dilution ratio^ and counting the number of 
colonies formed. 

5 

5-FOA colonies were separated^ and cultured in ASW-YT 
liquid medium. The cells were incubated in ASW-AA liquid 
medium for two days in order to avoid carry over of uracil,, 
and passaged into ASW-AA liquid medium with or without 
10 5 lag/ml uracil to study the nutritional requirement of the 
isolates for uracil of isolates. 

(Enzymatic Assay) 

Cell-free extracts of T. kodakaraensis KODl and 
15 variants thereof were prepared as follows: cells were 
cultured in ASW-Y liquid medium for twenty hours, and 
collected by centrifugaion (6,000 x g, 30 minutes) , and the 
cells were resuspended in 50 mM Tris-HCl (pH 7.5) containing 
0.1 % v/v Triton X-100. The samples were vortexed for ten 
20 minutes, centrifuged at 3,000 x g for twenty minutes, and 
the resultant supernatant retained as cell-free extract. 
Protein concentration was determined using the Bio-Rad 
Protein Assay System (Bio-Rad, Hercules, CA, USA) using 
bovine serum albumin as a standard. 

25 

Orotidine-5 ' -monophosphate decarboxylase 
( OMPdecase , PyrF) activity was determined by 
monitoring the reduction in optical density at 285 nm 
(OD;^285nm) f derived from the conversion of 
30 orotidine-5 ' -monophosphate (OMP) into 

uridine-5 ' -monophosphate (UMP) (Beckwith, J. R., A. 
B. Pardee, R. Austrian, and F. Jacob. 1962. Coordination 
of the synthesis of the enzymes in the pyrimidine 
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pathway of E. coli. J. Mol. Biol- 5: 618-634.) . The 
assay mixture consists of 100 mM Tris-HCl (pH 8.6)^ 
1.5 mM MgCl2r 0.125 mM OMP and enzyme solution in 1ml 
in total. This mixture was preincubated at 85 °C for 
5 5 minutes in a capped cuvette, and the reaction was 

initiated by adding an enzyme solution and monitored 
for 10 minutes at the same temperature. 

Orotinate phoshoribosyltransf rase (OPRTase, PyrE) 

10 activity was assayed by spectrometrically measuring 
orotinic acid at 295 nm. When measuring enzyme sample from 
pyrE"^ strain^ continuous decarboxylation by intrinsic OMP 
decase of the reactant product OMP should be taken into 
account. As OMP decase activity is higher than OPRTase in 

15 T. kodakadaensis r OPRTase activity may be determined at 
□□f?5^.ofJ:^3f 670 M^ ■ cm This does not correspond to the 

conversion from orotinic acid to UMP via OMP. In the case 
of the pyrF" strain, we monitored the conversion of the 
vstarting substrate to OMP by means of PG295;Sf 2^ 52Q^M •^cm/yj. 

20 This reaction was performed in 1 ml mixture comprising 
Tris-HCl (pH 8.6), 1 . 5 mM MgCla, 0.125 mM orotinic acid, 
cell-free extract, and 1.6 mM 

5-phosphoribosylpyrophosphate (PRPP) . The same assay 
mixture free of PRPP was placed in a capped cuvette, and 

25 preincubated at 85 °C for 10 minutes, and the reaction was 
initiated by the addition of PRPP. The decrease in A295 was 
measured at the same temperature for three minutes. 

(DNA engineering and sequencing) 
30 General DNA engineering was performed as described in 

Sambrook and Russel (Sambrook, J. , and D.Russel. 2001. 
Molecular cloning: a laboratory manual, 3rd edn. Cold Spring 
Harbor Press, Cold Spring Harbor, N.Y.). The genomic DNA 
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of T. kodakaraensis was isolated as described above. PGR 
was performed using KOD -Plus- (TOYOBO, OSAKA , JAPAN) as 
the DNA polymerase. The sequence of the primers used for 
PGR are shown below. Optionally, DNA fragments amplified 
5 by PGR were phosphorylated by T4 kinase (TOYOBO) . 
Restriction enzymes and modification enzymes were purchased 
from TaKaRa (Kyoto, Japan) or Toyobo. DNA fragments were 
collected after agarose gel electrophoresis, and GFX PGR 
DNA and a Gel Band Purification Kit (Amersham Pharmacia 
10 Biotech, Uppsala, Sweden) were used for purification thereof. 
Plasmid DNA was isolated using Qiagen Plasmid Kits (Qiagen, 
Hilden, Germany) . DNA sequencing was performed using ABI 
PRISM kit and a Model 3100 capillary sequencer (Applied 
Biosystems, Foster Gity, GA, USA) . 

15 

(Gonstruction of pUDT and pUDT2) 

Two disruption vectors pUDTl (SEQ ID NO: 2158) 
and pUDT2 (SEQ ID NO 2159) were constructed for respective 
homologous recombination of single and double cross-over 
20 events in T. kodakaraensis. They were constructed as 
follows: a DNA fragment (67 6bp) containing Tk-pyrF was 
amplified from T. kodakaraensis KODl genomic DNA using the 
following primers 
TKl-DUR/TKl-DUF: 

25 

TKl-DUR/TKl-DUF : 5 ' -GGGCATATGGAGGAGAGGAGGGTCATTGTGGCG-3 ' 
(SEQ ID NO; 2160) / 5' -GTGAGGGGGTGTTTGAGTTTGAA~3' (SEQ ID 
NO: 2161) , wherein underlined sequences indicate Ndel sites. 

30 Deduced promoter region {130 bp) was amplified from 

primers TK2-DPR/TK2-DPF: 

TK2-DPR/TK2-DPF : 5 ' -GGGCTGCAGCCGGAAGGCGGATTTTGGTGAGGGGAA 
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AA-3' ( SEQ ID NO: 2162 ) 

/5' -GGGCATATGCATCACCTTTTTAACGGCCCTCTCCAAGAG-3' (SEQ ID NO: 
2163) , wherein underlined sequences indicates PstI andNdel 
sites, respectively. 

5 

Both fragments were subcloned into pUCllS in an 
appropriate promoter pyrF direction. The resultant plasmid 
was designated as pUD (3,944). A short fragment (788 bp) 
of Tk-trpE was amplified using the following primers 
10 TK3-DTR/TK3-DTF : 

TK3-DTR/TK3~DTF : 5 ' -GGGGCATGCGGTGGCTT 

CGTTGGCTACGTCTCCTACG-3' ( SEQ ID NO: 2164 ) 
/5' -GGGCTGCAGTTCGGGGCTCCGGTTAGTGTTCCCGCCG-3^ (SEQ ID NO: 
15 2165) , wherein underlined sequences indicate SphI and PstI 
sites. Next, this was ligated with pUD at Sphtl and PstI 
sites to yield pUDTl (4732 bp) . 

In order to construct pUDT2, fragments containing 
20 Tk-trpE and flanking regions (2223 bp) were amplified using 
the following primers TK4-DT2R/TK4-DT2F : 

TK4-DT2R/TK4-DT2F : 5' -GGGGTCGACCGGG 

TCTGGCGAGGGCAATGAGGGAC-3' ( SEQ ID NO: 2166 ) 
25 /5' -GGGGAATTCGGTTATAGTGTTCGGAACGACCTTCACTC-3' (SEQ ID NO: 
21267), wherein underlined sequences indicate Sail and 
EcoRI sites, respectively) 

This was subcloned into Sail and EcoRI sites of pUC119 . 
30 The resultant plasmid was designated pUT4 (5, 340 bp) . pUD 
was digested with PvuII, and the fragment containing pyrF 
and the deduced promoter region (1104 bp) was isolated. 
pUDT2 (6,012 bp) was obtained by inserting the isolated 
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fragment in pUT4, into the blunt ended Sad sites of Tk-trpE. 

Linear DNA fragments for homologous recombination in 
T. kodakaraensis were prepared by PGR using pUDT2 as a 
5 template, and purified after agarose gel electrophoresis. 

(Transformation of T. kodakaraensis) 

The calcium chloride method for Methanococcus voltae 
PS (Bertani, G. , and L. Baresi . 1987 . Genetic transformation in 

10 the methanogen Methanococcus voltae PS. J. Bacterid . 169 : 
2730-2738.) was modified for transformation of 
T. kodakaraensis. T . kodakaraensis KU25 was cultured for 
twelve hours in ASW-YT liquid medium, and cells were 
collected from 3 ml broth during later log phase (17,000 

15 ^ g, 5 minutes), and resuspended in 200 ]il transf ormatinon 
buffer (in order to avoid precipitation phenomensa between 
calcium cations and phosphate groups, in 80 mM CaCl2 in 0.8 
modified ASW free of KH2PO4) (1/15 vol.). This was 
maintained on ice for 30 minutes. Next, 3 \ig DNA was 

20 dissolved in TE buffer, and added to the suspension. 
Further, the cells were incubated on ice for one hour, 
followed by heat shock at 85 °C for 45 seconds, and further 
incubated on ice for 10 minutes. As control experiments, 
an equal volume of TE buffer was added to the cell in lieu 

25 of DNA- Processed cells were screened for Pyr"^ transformant 
by passaging two generations in the absence of uracil in 
20 ml of ASW-AA liquid medium. Next, the cells were diffused 
on an ASW-7\A plate, free of uracil, and incubated for 5-8 
days at 85 °C. Resultant Pyr"" strain was analyzed by 

30 Southern hybridization using colony PGR and DIG-DNA 

labeling and detection kit (Boehringer Mannheim, Mannheim, 
Germany) . 
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(EXPERIMENTAL PROCEDURES) 

Double targeting disruption was performed using 
circular DNA molecules for double cross-over gene 
disruption. The exemplary scheme is shown in Figure 1. 

5 

(Preparation of a disruption vector) 
(Preparation of KOD-1) 

The KOD-1 strain was prepared as described above. 

10 (Transformation and homologous recombination) 

As described above, transformed KOD-1 strain was 
maintained in ASW-AA. In this instance,. KOD-1 strain growth 
is sustained by carried-over uracil. 

15 Next, the KOD-1 strain was inoculated into fresh amino 

acid liquid medium. PyrF+ is the only strain in which 
homologous recombination occurred, and therefore grows in 
fresh amino acid liquid medium, this allowed screening and 
isolation of strians in which homologous recombination had 

20 occurred. 

Next, isolated strains were inoculated into ASW-AA. 
Colonies grown on solid medium were confirmed with colony 
PCR and Southern blotting analysis. The procedure therefor 
25 is described as follows: 

Reaction mixture: 2 . 5 unit KOD polymerase (TOYOBO) 0.5 
Vil; 10 X KOD polymerase buffer (TOYOBO) 5.0]Jil;25mM MgCla 
4.0vil; dNTP mixture 4.0]j1; 2 0pmol/pl primer 1 

30 0.5ij1; 20pmol/}il primer 2 0-5]j1; sterilized water 37.0vil; 
cell suspension 0.5pl. 

This reaction mixture was incubated under the 
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following reaction conditions: 96 °C, 2 minutes, 96 ^C, 30 
seconds, 55 °C, 3 seconds, 72 °C, 30 seconds, 30 cycles; 
72 °C 3 minutes. 

Colony PGR and Southern blotting analyses were 
performed to yield the following results: 
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TABLE 3: Double cross-over gene targeted disruption 





Control 


Trans formantl 


Transf ormant2 


cacr2 


+ 


+ 


+ 


DNA 


TE buffer 


pUDT2 


pUDT2 


Growth in 
amino acid 
liquid medium 
in the 
presence of 
carried-over 
uracil 


No growth 


Growth 


Growth 


T/C 


not 

available 


12/12 


5/12 


Total T/C 


not 

available 


17/24 



T/C refers to the number of clones which were screened 
by transf ormant/colony PGR of interest (i.e., PyrF"^ strain) . 

5 

As shown in the above results, it was demonstrated that 
targeted double cross-over disruption of genes using 
circular molecules proceeds at a very high ratio. 

10 (EXAMPLE 3: Examples of double cross-over disruption; 

cases where linear DNA was used) 

Next, examples of double cross-over using linear DNA 
molecules were shown. 

15 (Production of the disruption vector) 

Linear DNA was prepared as shown in Figure 2 as a linear 
disruption vector. Linear DNA was obtained by 

amplification using pUDT2 prepared in Example 2 as a template 
using appropriate primers. 
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(Preparation of KODl) 

The KOD-1 strain was prepared as described in Example 

2. 

5 

(Transformation and homologous recombination) 
Prepared KOD-1 strain was transformed using the 
calcium chloride method. The transformed KOD-1 strain was 
maintained in ASW-AA. In this instance, KOD-1 strain growth 
10 is sustained by carried-over uracil. 

Next, the KOD-1 strain was inoculated into fresh amino 
acid liquid medium. PyrFH- strain is the only strain in which 
homologous recombination occurrs, and therefore grows in 
15 fresh amino acid liquid medium, allowing screening and 
isolation of strains in which homologous recombination has 
occurred. 

Next, isolated strains were inoculated into ASW-AA. 
20 Then colonies grown on the solid medium were confirmed by 
colony PGR and Southern blotting analysis. The procedure 
therefor is described as follows: 

Colony PGR and Southern blotting were performed as 
25 described above. 

As analyzed above, the following results were 
obtained. 



Table 4: Gene targeted disruption by double cross-over 





Gontrol 


Transf ormantS 


Transf ormant4 


CaCl2 


+ 


+ 


+ 


DNA 


TE buffer 


Linear DNA 


Linear DNA 


Growth in 


No growth 


Growth 


Growth 
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amino acid 
liquid medium 
in the 
presence of 
carried-over 
uracil 








T/C 


not 

available 


7/12 


0/12 


Total T/C 


not 

available 


7/24 



T/C refers to the number of clones which were screened 
by transformant /colony PCR of interest (i.e. ^ PyrF"^ strain) . 



5 As shown in the above results, it was demonstrated that 

targeted double cross-over disruption of genes using linear 
molecules proceeds at a sufficiently high ratio, although 
lower than those using circular molecules. It is thought 
that the reason for lower ratios than that observed using 
10 circular molecules include digestion of linear molecules 
by host nucleases. 

Further, in light of the above-mentioned results, when 
determining a preferable length for linear DNA, if there 

15 are at least 500 bases at both termini, targeted disruption 
progresses at about 5 % or more, and if there are at least 
respective 1000 bases at both termini, targeted disruption 
progresses at about 20 % or more. Accordingly, it is 
understood that targeted disruption using a linear molecule 

20 requires at least 500 bases, and preferably at least 1,000 
bases of nucleic acid sequences at both termini. 



(EXAMPLE 4: Examples of double cross-over disruption: 
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other genes) 

A gene other than the above-mentioned genes (for 
example^ a sequence encoding SEQ ID NO: 395 (Tryptophane 
synthase) ) is selected to perform similar experiments based 
5 on tryptophane nutritional requirement, and similar 
targeted disruption was performed. 

(EXAMPLE 5: Single cross-over disruption) 
Gene targeted disruption was performed using a 
10 circular molecule using a single cross-over dirsuption 
system. Schematic drawing is shown in Figure 3. pUDT (SEQ 
ID NO: 2158) was prepared as described above. 

(Preparation of KODl) 
15 The KOD-1 strain was prepared as described in Example 

2. 

(Transformation and homologous recombination) 
Prepared KOD-1 strain was transformed with the calcium 
20 chloride method. The Transformed KOD-1 strain was 
maintained in ASW-AA. In this instance, the KOD-1 strain 
grows with carried-over uracil. 

Next, the KOD-1 strain was inoculated to a fresh amino 
25 acid liquid medium. As PyrF+ strain, in which homologous 
recombination occurred, only grows in fresh amino acid 
liquid medium, this allows screening and concentration for 
those in which homologous recombination has occurred. 

30 Next, grown strains were inoculated into ASW-AA. 

Then colonies grown in the solid medium were confirmed with 
colony PGR and Southern blotting analysis. The procedure 
therefor is described as follows: 
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Colony PGR and Southern blotting were performed as 
described above . 

5 As analyzed above, the following results were 

obtained. 



TABLE 5: Gene targeted disruption by single cross-over 





Control 


Transf ormantS 


Transf ormant 6 


CaCl2 


+ 


+ 


+ 


DNA 


T E buffer 


p U D T 1 


p U D T 1 


Growth in 
amino acid 
liquid medium 
in the 
presence of 
carried-over 
uracil 


No growth 


Growth 


Growth 


T/C 


not 

available 


1/96 


2/96 


total T/C 


not 

available 


3/192 



T/C refers to the number of clones which were reviewed 



10 by transf ormant/colony PGR of interest (i.e.^ PyrF"^ strain) . 

As described above, it is understood that gene 
targeted disruption by single cross-over using a circular 
molecule progresses at a much lower rate than the gene 
15 targeted disruption by double cross-over. A reason why 
efficiency by single cross-over is lower than that by double 
cross-over is believed to be the digestion of pUDTl by 
restriction enzymes from the host. 
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As such, the present invention is demonstrated to work 
in a system using single disruption. Further, when using 
a linear molecule, the system using single disruption works, 
although at much lower rate. 

5 

(EXAMPLE 6: Examples of single cross-over disruption; 
other genes) 

Genes were disrupted by single cross-over as in 
Example 4, and it was demonstrated that disruption was 
10 permissible, although efficiency thereof was not as good 
as in Example 5. 

(Example 7: expression of DNA ligase gene) 

In order to express an ATP dependent DNA ligase in 

15 Escherichia coli, the following protocols were used. 
Fragments of the phage clone comprising the sequence of DNA 
ligase identified in the present invention (for example, 
SEQ ID NO: 1131) was used as a template to yield fragments 
of two types of DNA ligase coding regions, which were 

20 inserted into pUC18. The sequences of the inserted 
fragments were confirmed and the fragments comprising the 
DNA ligase from the plasmid was inserted into the plasmid 
pET21a (Novagen) to construct the plasmids. The expression 
and the activity were confirmed as follows: 

25 Escherichia coli BL21 (DE3) was transformed with the 

plasmid. The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 
% NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 
•7H20(pH7) ) , cultured at 37 °C until the ODeeo reached 0.4. 

30 Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto to continue the culture at 37 **C. After 
culture, cells were collected by centrif ugation, broken by 
sonication, and centrif uged to yield a cell extract, which 
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was disrupted by sonication^ and this was again centrifuged 
to recover soluble fractions. The resultant fraction was 
processed at 70 °C for ten minutes and the thermostable 
soluble fraction was centrifuged again to yield a sample. 
5 This sample may be further purified using a variety of well, 
known purification methods and a combination thereof. 

Enzymatic activities are measured by a method for 
observing a change of mobility of DNA fragments after the 

10 obtained samples were digested with lambda phage DNA Hind 
III^ and the resultant was agarose gel electrophoresed; or 
a method for reacting the obtained sample to an oligo dT 
labeled with ^^P and removing unreacted ^^P with alkaline 
phosphatase, and then measuring radioactivity thereof (see 

15 Rossi, R et al, ( 1997 ) Nucleic Acids 

Research, 25 (11) : 2106~2113;Odell, M. et al., (1996) Virology 
221:120-129;Sriskanda,V.et al, (1998 ) Nucleic Acids 

Research, 26 (20) : 4618-4625;Takahashi,M. et al., (1984)The 
Journal of Biological Chemistry, 259 ( 16) : 10041-10047 )) . 

20 

(Examples 8: expression and confirmation of formic 
acid dehydrogenase) 

Formic acid dehydrogenase is an enzyme catalyzing a 
reaction oxydizing formic ion into CO2. The reaction 
25 thereof is represented by the formula: HCOO-+NAD+^C02+NADH. 
As used herein, NAD (nicotine amide adenine dinucleotide; 
reductive type is NADH) is one of the coenzymes relating 
to the redox reaction. 

30 Formic acid dehydrogenase activity is measured using, 

for example, NADP+ (340 nm, e=6. 22x10^), methyl viologen 
(600 nm, e=l . 13x10'') , or benzyl viologen (605 nm, e 
=1.47x10^) (Andreesen, J.R.et al.. 
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( 1974 ) J. Bacteriol . ,120: 6-14 ) . 

Known formic dehydrogenases include a homodimer 
consisting only of alpha subunits^ a heterodimer and 
5 heterotetramer consisting of alpha and beta subunits, and 
a dodecamer consisting of alpha^ beta and gamma subunits. 

Formic acid dehydrogenases of the present invention 
may consist of single or plural subunits. Preferably, the 
10 formic acid dehydrogenases consist of two or more subunits. 

(Expression of thermostable formic acid 
dehydrogenase ) 

In order to express the formic acid dehydrogenases 
(SEQ ID NO: 305, 673, 1050 and 1051) encoded by an open 
reading frames obtained by the present invention, in 
Escherichia coli, the following operations were performed: 
fragments containing the open reading frames were amplified 
by PGR technology and inserted in plasmid pET21a{+) 
(Novagen) to yield an expression plasmid. These plasmids 
were used to transform Escherichia coli BLl (DE3) strain. 

The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
25 NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04* 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
30 centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 **C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
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solutions . 

The crude enzyme solution was measured for its formic 
acid dehydrogenase enzymatic activity according to routine 
5 method (Andreesen, J.R. et al.^ (1974) J. Bacterid . ^ 120 : 
6-14) . Further^ the enzyme has an optimum temperature at 
90 °C. 

(EXAMPLE 9: hyperthermostable beta-glycosidase) 
10 Beta-glycosidases collectively refer to a group of 

enzymes hydrolyzing a beta-glycoside bond. Beta 
glycosidases include^ for example, beta-glucosidase, 
beta-galactosidase, beta-mannosidase^ beta-f ructosidase 
and the like. 

15 

Beta-galactosidase, a type of beta-glycosidase, is an 
enzyme hydrolyzing beta-D-galactoside to yield D-galactose. 
Degrading lactose (glucose-beta-D-galactoside) into 
glucose and galactose using a galactosidase is a method for 

20 producing low-lactose milk by processing the lactose in cow 
milk. For these purposes, in addition to adding the enzyme 
into milk, the use of a fixed enzyme is also considered. 
Generally, enzymes used as a fixation enzyme present 
preferably high activity at the reaction condition used (pH, 

25 temperature and the like), and is structurally stable. 

As used herein, beta-galactosidase is an enzyme 
hydrolyzing beta-D-galactoside to produce D-galactose, and 
is systematically called beta-D-galactoside 

30 galactohydrolase. Beta-glycosidase of the present 

invention may have beta-glucosidase, beta-mannosidase 
and/or beta-xylosidase activities in addition to 
beta-galactosidase activity. Beta-glycosidase of the 
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present invention may have transferring activity in 
addition to hydrolyzing activity of oligosaccharides. 

(Expression of beta-glycosidase) 
5 Beta-glycosidase (SEQIDNO: 1122) was expressed using 

the same method as described above in the Examples. The 
resultant ampicillin resistant transf ormants were 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 

10 7H20(pH7)) containing amplicillin (50pg/ml) , cultured at 
37 "^C until the ODeeo reached 0.5. 

Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C. After 
culture, cells were collected by centrif ugation, broken by 

15 sonication in 100 mM vicine/KOH (pH8.3)/10mM MgCl2, and 
centrifuged again to yield a soluble fraction, which was 
then heated at 85 ""C for thirty minutes. Heat-stable 
soluble fractions were centrifuged and concentrated, and 
then were subjected to sodium dodecyl sulfate 

20 polyacrylamide electrophoresis (SDS-PAGE) to detect a 
expected band of molecular weight, and the band was seen 
to increase over time after the induction by IPTG. 

The sample was heat treated as above and used for 
25 determining the enzymatic chemical properties of 
beta-glycosidase of the present invention. As for methods 
of measuring enzymatic activities, see Pisani, F.M. et al., 
Eur. J.Biochem. , 187, 321-328 (1990). Enzymatic acitivity 
of liberalizing 1 |imol p-nitrophenol per minute was 
30 considered lU. 

The optimum pH of beta-glycosidase of the present 
invention was examined. The reaction was performed in a 
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variety of buffers, including 1.5ijg/inl of the enzyme with 
2.8 mM pNp beta-glucopyranoside as the substrate at 75 "'C. 
The buffers used were sodium phosphate buffer (pH 6-8), 
citrate buffer (pH 4-6), borate buffer (pH 8-9), glycine 
5 buffer (pH 8.5-10) (data not shown). These results show 
that the beta-glycosidase has its optimum pH at around pH 
6.5. 

Optimum temperature for beta-glycosidase of the 
present invention was also examined. Reactions were 
performed in sodium phosphate buffer (pH 6.5) including 
l.Sjjg/ml of the enzyme with 2.8 mM pNp beta-glucopyranoside 
as the substrate at a variety of temperatures (data not 
shown) . As a result, the beta-glycoidase of the present 
invention has its optimum temperature at around 100 ^'C. 
Further, Arrhenius plotting was performed using this result, 
and it was demonstrated that the gradient of the line is 
changed around 75 "^C { l/T*10-3=2 . 87 ) . The results were 
applied to the formula k=Ae-E/RT (wherein k is reaction rate 
constant, E is activation energy, R is gas constant, T is 
absolute temprature, A is frequency factor) , it was 
calculated that E=53.4 kJ/mol in the range of 25-75 °C, and 
E==17.7 kJ/mol in the range of 75-100 °C. 

25 Thermostability of beta-glycosidase of the present 

invention was examined- After the above samples were 
incubated for a variety of times at 90 or 100 °C, enzymatic 
activity was measured at 80 °C in 50 mM sodium phosphate 
buffer (pH 6.5), including 1.5 pg/ml of the enzyme and using 

30 2.8 mM pNp-beta-glucopyranoside as a substrate (data not 
shown) . This result indicates that the beta-glycosidase 
has about 18 hours and 1 hour of thermostability at 90 "^C 
and 100 ^'C, respectively. Similar experiments were 



15 
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performed at 110 **C, the enzyme was inactivated after about 
15 minutes. 

Substrate specificity of beta-glycosidase of the 
5 present invention was examined. Activities against a 
variety of substrates at 2 . 8 mM were measured at 80 °C in 
50 mM sodium phosphate buffer (pH 6.5) containing 1.5 pg/ml 
of enzyme, and it was demonstrated that the beta-glycosidase 
of the present invention has high beta-glycosidase activity, 
10 and further, has beta-mannosidase, beta-glycosidase and 
beta-xylosidase activities. 

Reaction rate constants for these four enzymes were 
determined by measuring the activity against substrates by 

15 incubating each 2 mM of oligosaccharide (beta-lactose, 
cellobiose, cellotriose, cellotetraose and cellopentaose) 
with 3.0vig/ml enzyme at the concentration of 0 . 28 mM to 5 . 6 
mM, in 50 mM sodium phosphate buffer (pH 6.5) containing 
1.5 lag/ml at 80 ^^C for seven hours. Next, the reactant 

20 solution was subjected to thin layer chromatography (TLC) 
(data not shown) . Spots of glucoses were observed in lanes 
other than the beta-lactose lane. Cellotetraose, a 
tetrasaccharide, was divided into trisaccharide and 
monosaccharide, and cellopentaose, a pentasaccharide, was 

25 divided into tetrasaccharide and monosaccharide, 
respectively. These results show that the beta-glycosidase 
of the present invention has an exo-type of hydrolyzing 
activity. 

30 5 mM solutions of cellobiose, cellotriose, 

cellotetraose and cellopentaose in 50 mM sodium phosphate 
buffer (pH 6.5) containing 3 jag/ml of enzyme were incubated 
at 80 ®C for four hours. Cellotetraose was also incubated 
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for 0^ 1, 2, 4 and 7 hours in a similar reaction system. 
Next^ the reaction solution was subjected to thin layer 
chromatography (TLC) . Cellobiose, cellotriose, 

cellotetraose and cellopentaose are disaccharides, 
5 trisaccharides, tetrasaccharides and pentasaccharides^ 
respectively, and larger spots than these saccharides were 
observed after reaction. This result demonstrates that the 
beta-glycosidase of the present invention has 
sugar-transf erase activity in addition to an exo-type 

10 sugar-degrading activity In this reaction condition, 
glucose and cellobiose were increased over time, and this 
means that hydrolyzing activity, rather than transferring 
activity, is increased over time. That is, 

beta-glycosidase of the present invention can be applied 

15 to the synthesis of oligosaccharides having any combination 
of beta linkage such as oligosaccharide in which cellobiose 
is linked to mannose, and the like. 

(EXAMPLE 10: hyperthermophillic chitinase) 
20 Chitin is a type of mucopolysaccharides, and has a 

structure of beta-poly-N-acetylglucosamine . Chitinase is 
an enzyme present as a cell-wall substance of arthropods, 
molluscs, crustaceans, insects, fungi, bacteria and the 
like, in an abundant amount, which hydrolyzes a chitin, and 
25 is found in the gastric juice of snails, exuvial fluid of 
an insect, fruit skin, microorganisms and the like. This 
enzyme produces N-acetylglucosamine by hydrolysis of 
beta-1,4 linkage of a chitin, and has a systematic name of 
poly (1, 4-beta- (2-acetamide-2-deoxy-D-glucoside) ) 
30 glucanohydrolase . 

Chitinase may be industrially useful for the purpose 
of decomposing chitin, which is present in an abundant amount 
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in nature, into forms more available to microorganisms and 
the like. Further, chitinase is also believed to play an 
important role as a protection mechanism against pathogens 
in plants, and thus attempts have been made to develop a 
5 disease-desistant plant by introducing a gene encoding the 
subject enzyme. 

(Expression of hyperthermophillic chitinase) 
As described in the above-mentioned Examples, 

10 hyperthermophillic chitinase (SEQ ID NO: 991) was expressed. 
The resultant ampicillin resistant transf ormants were 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 ""C until the ODeeo reached 0.3. 

15 Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C. After 
culture, cells were collected by centrif ugation, broken by 
sonication, and centrif uged to yield a cell extract. The 
resultant cell extract was heated at 70 ®C for ten minutes, 

20 and then the obtained thermophillic fraction was 
centrifuged to yield the supernatant thereof as a sample, 
which was subjected to sodium dodecyl sulfate 
polyacrylamide gel electrophoresis (SDS~PAGE) , and the 
expected band was detected at about 130 kDa. 

25 

The sample was heat-processsed as above and purified 
using ammonium sulfate precipitation (40% saturation), 
anionic exchange column (HiTrapQ) , gel filtration column, 
and anionic exchange column (MonoQ) so that only single band 
30 is observed on an SDS-PAGE. 

The enzymatic activities were measured in accordance 
with a method "Chitin, Chitosan Experimental Manual" 
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(Chitin Chitosan Research Ed.^ Gihodo Publishing) using 
colloidal chitin- The amount of enzyme required to produce 
a reduced saccharide corresponding to 1 pmol 
N-acetylglucosamine per minute was defined as 1 U. 

5 

Colloidal chitin as a substrate was prepared as 
follows: 10 g Chitin (Wako Pure Chemical) was solubilized 
in 500 ml of 85 % phosphoric acid and agitated for 24 hours 
at -4 °C- The viscous liquid was added to a ten-fold volume 

10 of deionized water while agitating. The precipitate was 
obtained by centrif ugation, and the resultant was 
repeatedly washed by deionized water until the pH thereof 
was 5.0 or higher. NaOH was adjusted to pH 7,0^ and then 
washed with deionized water for one more time. This was 

15 solubilized in a small volume of water and autoclaved. 

The optimum temperature of hyperthermostable 
chitinase of the present invention was determined by 
measuring the activities of the above-mentioned purified 
20 enzymes in 50 mM sodium phosphate (pH 7.0) for sixty minutes 
at a variety of temperatures. The reaction was terminated 
by cooling on ice (data not shown) . The hyperthermostable 
chitinase of the present invention was shown to have an 
optimum temperature at about 80 °C. 

25 

Optimum pH of the hyperthermostable chitinase of the 
present invention was determined by measuring the 
activities of the above-mentioned purified enzymes for 
sixty minutes at a variety of pH levels using the following 
30 buffers: 50 mM disodium hydrogen citrate-HCl (pH2 . 5-'4 . 0 ) ; 
50mM sodium acetate (pH4 . 0-'5 . 5) ; 50mMMES-NaOH (pH5 . 5-7 . 0 ) ; 
50mM Tris-HCl (pH7. 0-9.0); 50mM glycine-NaOH (pH9 . 0-10 . 0 ) . 
The reaction was terminated by cooling on ice. The result 
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is shown in Figure 5. The hyperthermostable chitinase of 
the present invention was demonstrated to have an optimum 
pH at about 4.0. Further, peaks were observed at about pH 
8.0. 

5 

The effects of salt on the activity of 
hyperthermostable chitinase of the present invention was 
studied by measuring the activities of the above-mentioned 
purified enzymes in 50 mM sodium phosphate (pH 7.0) with 

10 a variety of concentrations of salt (NaCl or KCl) added 
thereto for 120 minutes at 80 °C- The reaction was 
terminated by cooling on ice (data not shown) . The activity 
of the hyperthermostable chitinase of the present invention 
was increased by the addition of the salt, and in particular, 

15 the addition of KCl increased the activity by about two fold. 

The hyperthermostable chitinase of the present 
invention was studied for the effects thereof on 
oligosaccharide and colloidal chit in. Oligosaccharides 

20 used were N — acetyl — D — glucosamine ( G 1 ) , 
di-N-acetyl-chitobiose (G2) , tri-N-acetyl-chitotriose 
(G3) , tetra-N-acetyl-chitotetraose (G4) , 

penta-N-acetyl-chitopentaose (G5) and 

hexa-N-acetyl-chitohezaose (G6) . Fifty pi of reaction 

25 mixture containing 0.7 mg of each oligosaccharide, 70 mM 
sodium acetate buffer (pH 6.0), 200 mM KCl, and purified 
enzyme (for G1-G3, 0.9 pg, and for G4-G6, 1.8 ]ig) was 
incubated at 80 °C and sampled at 0, 5, 15, 30, 60 or 120 
minutes thereafter. As for colloidal chitin, 1 ml total 

30 reaction mixture containing 0.16 mg colloidal chitin, 50 
mM sodium acetate buffer (pH 5.0), and 0.6 ]ig of purified 
enzyme was incubated at 80 **C, and sampled at 1.5, 3.0 and 
4.5 hours thereafter, and centrifuged to concentrate 20 fold. 
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Next, the samples were subjected to TLC as follows: sampled 
solution was spotted on Kieselgel 60 silica gel plate (Merck) , 
and development solution (n-butanol : methanol : 25% ammonia 
solution: water=5 : 4 : 2 : 1 ) was used for the development 
5 thereof. After development, the plates were dried, and 
developing reagents (anillin 4 ml, diphenylamine 4 g, 
acetone 200 mL, 85 % phosphoric acid 30 mL were mixed for 
preparation) was atomized and this was heated at 180 °C for 
about five minutes for coloring (data not shown) . 

10 

From this result, it was demonstrated that the 
hyperthermophillic chitinase of the present invention has 
no degrading action against disaccharides or lower, and when 
chitin was used as a substrate, the enzyme mainly produced 
15 chitobiose, a disaccharide, as a main product. 

The hyperthermostable chitinase of the present 
invention was also studied for effects on 4-methyl 
umbellipherone (4-MU) . GlcNAc-4-MU, GlcNAc2-4-MU or 

20 GlcNAc3 -4-MU (O.OlmM) lOjul, lOOmM acetate buffer (pH5.0) 
990vil, and the purified enzyme 20pl(18ng) were incubated 
at 80 °C, At 0, 5, 15, 30, 45, 60, or 180 minutes, 100 \il 
of the reaction solution was sampled, and added to 900 ^l 
of ice-cold 100 mM glycine-NaOH (pH 11) to terminate the 

25 reaction- The samples were measured for their excitation 
at 350 nm and fluorscence at 440 nm by spectrof luorometer 
(data not shown) . As a result, reation rates against each 
substrate were determined. 

30 It was reported that reaction rates against 

disaccharide derivatives and against trisaccharide 
derivatives were compared and thus the digestion type of 
the enzymes was either endo-type or exo-type (Robbins, P. 



404 



KJ002 



W., J. Biol. Chem., 263 (1), 443-447 (1988)). In this case, 
when the reaction rate against disaccharide derivative is 
greater than that of the other, the enzyme is expected to 
be exo-type, whereas when the reaction rate against 
5 trisaccharide is greater than that of the other, the enzyme 
is expected to be endo-type. Based on this description, the 
hyperthermostable chitinase of the present invention is 
determined to be endo-type. 

10 Functions possessed by each domain of the 

hyperthermostable chitinase of the present invention were 
studied by creating a variety of deletion mutants . Deletion 
mutants Pk-ChiAAl (containing the first Bacillus circulans 
chitinase homologous region and two cellulose binding 

15 domains), Pk-ChiA A 2 (containing the fourth Streptomyces 
erythraeus chitinase homologous region and two cellulose 
binding domains), Pk-ChiA A 3 (containing the first Bacillus 
circulans chitinase homologous region) , and Pk-ChiAA4 
(containing the fourth Streptomyces erythraeus chitinase 

20 homologous region) , were produced based on the previous 
reference (Japanese Laid-Open Publication 11-313688) . 

From the culture of E. coli transformant strains 
possessing each plasmid, crude enzyme solution was obtained 

25 by heat treating at 70 ^^C for 10 minutes. This crude enzyme 
solution was spotted on a colloidal chitin plate (0.5 % 
colloidal chitin, 1.5 % agar) and was incubated to study 
the activities thereof (data not shown) . Deletion mutants 
having only the first chitinase homologous region showed 

30 some activity, and the deletion mutants having the fourth 
chitinase homologous region only showed little activity. 
All of the deletion mutants having any chitinase homologous 
regions and the two cellulose binding domains showed high 
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activities . 

Thirty /x 1 of the crude enzyme solution of deletion 
mutants Pk — Ch iAA2 and Pk— Ch iAA4 was mixed 
5 with 30 At 1 of 1 % collidal chitin, and incubated at 70 °C 
for one hour. Next, the reaction solution was centrifuged 
and the supernatant and a precipitate containing the 
colloidal chitin was obtained. The precipitate was washed 
twice with 50 mM sodium phosphate (pH7.0), and was sub j ected 
10 to SDS-PAGE (data not shown) . This result shows that the 
two cellulose binding domains are necessary for binding to 
a chitin and for chitinase activity. 

(EXAMPLE 11: Hyperthermostable ribulose bisphosphate 

15 carboxylate) 

Ribulose bisphosphate carboxylase is an enzyme 
catalyzing photosynthetic reactions and is present in plant 
chloroplasts and microorganisms having photosynthetic 
ability. Ribulose bisphosphate carboxylase of higher 

20 plants is a macromolecule consisting of eight large subunits 
and eight small subunits (Type I), and is a major soluble 
protein in leaves of plants. On the other hand, ribulose 
bisphosphate carboxylase of microorganisms such as bacteria 
consists of only small subunits (Type II) . 

25 

Ribulose bisphosphate carboxylase is used as a marker 
for plant classification, and for example, as a cell marker 
for cell fusion. Further, in view of the possible 
improvement of the global environment, it has been attempted 
30 to modify ribulose bisphosphate carboxylase gene to produce 
a plant with increased fixation ability of CO2 in the air. 
Breeding of photosynthetic bacteria and device having 
photosynthetic ability may be intended for development. 
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For such purposes^ it is useful to have a gene encoding 
ribulose bisphosphate carboxylase having increased 
enzymatic activity and structural stablility. 

5 As used herein, the term "ribulose bisphosphate 

carboxylase refers to an enzyme adding CO2 to ribulose 
phosphate to produce two molecules of 3-phosphoglycerinic 
acid. Further, ribulose bisphosphate carboxylase has an 
activity of adding O2 to ribulose phosphate to produce 
10 2-phosphoglycolic acid and 3-phosphoglycerinic acid 
(oxygenase activity) . 

(Expression of hyperthermostable ribulose 
bisphosphate carboxylase) 

15 According to the method as described in the Examples 

above, hyperthermostable ribulose bisphosphate carboxylase 
(SEQ ID NO: 338) was expressed using PGR method. The 
resultant ampicillin resistant transf ormants were 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 

20 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)) containing amplicillin (50ijg/ml) , cultured at 
37 **C until the ODeeo reached 0.5. 

Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C. After 

25 culture, cells were collected by centrif ugation, broken by 
sonication in 100 mM vicine/KOH (pH8.3)/10mM MgCla, and 
centrifuged again to yield a soluble fraction, which was 
then heated at 85 °C for thirty minutes. Heat-stable 
soluble fractions were centrifuged and concetrated, and 

30 then were subjected to sodium dodecyl sulfate 
polyacrylamide electrophoresis (SDS-PAGE) to detect an 
expected band of a particular molecular weight, and the band 
was increased over time after the induction of IPTG (data 
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not shown) . 

The samples obtained by centrif ugation of the 
above-mentioned heat-stable soluble fractions were further 
5 purified using anion exchange column Resource Q (Amersham 
Pharmacia Biotech, Uppsala, Sweden) , and gel filtration 
column Superdex 200 HR 10/30 (Amersham Pharmacia Biotech, 
Uppsala, Sweden) , and confirmed that the band was single 
by SDS-PAGE (data not shown) . 

10 

Purification was performed using AKTA explorer lOS 
(Amersham Pharmacia Biotech, Uppsala, Sweden) . As for 
anionic exchange column, separation was performed by using 
gradient of 0-1 . 0 M NaCl, against buffer of 100 mM vicine/KOH 
15 (pH8.3)/10 mM MgClz. As for gel filtration, 50 mM sodium 
phosphate/0.15 M NaCl buffer was used. 

Analysis using gel filtration suggests that the 
expressed enzyme forms an octamer consisting of only large 
20 subunits. 

The carboxylase activity of samples as purified above 
were measured by using D-ribulose 1, 5-bisphosphate 
(RuBP) (Sigma) as substrate, in accordance with a method 
25 described in Uemura, K.et al.. Plant Cell Physiol., 
-37 (3) , 325-331 (1996) . 

First, optimal pH of the hyperthermostable ribulose 
bisphosphate carboxylase of the present invention was 
30 studied. Reactions were performed using a buffer 
containing citrate buffer (pH5.6), sodium phosphate 
buffer (pH6. 3) , vicine buffer (pH7.3, 7.8, 8.0 or 8.3), or 
glycine buffer (pH9.1 or 10.1), 10 mM MgCl2, and 30 mM RuBP 
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as substrate at a variety of temperatures. One unit of 
activity was characterized as fixing 1 pmol C02 per mg per 
minute. The results were expressed as a ratio against 
activity at pH 8.3. These results demonstrate that the 
5 hyperthermostable ribulose bisphosphate carboxylase has an 
optimum pH at about 8.3. 

The hyperthermostable ribulose bisphophate 
carboxylase of the present invention was investigated for 

10 its optimum temperature. Reactions were performed in 
buffer containing 10 0 mM vicine— KOH (pH8. 3) and 
1 0 mM M g C 1 2 f using 3 0 mM R u B P as substrate at a 
variety of temperatures (data not shown) . It was 
demonstrated that the hyperthemostable ribulose 

15 bisphosphate carboxylase of the present invention has an 
optimum temperature of about 90 °C. 

The thermbstablity of hyperthermostable ribulose 
bisphosphate carboxylase of the present invention was 

20 studied. The purified enzyme was measured for its remnant 
activities after incubation for a variety of time periods 
at 80 °C and 100 ""C (data not shown) . It was demonstrated 
that the thermostable ribulose bisphosphate carboxylase of 
the present invention has a half life of about 15 hours at 

25 80 ^^C. 

The carboxylase activity and oxygensase activity of 
the hyperthermostable ribulose phosphate carboxylase of the 
present invention was measured at 50-90 °C- Further, xvalue, 
30 which is carboxy activity/oxigenase activity, was 
calculated (see Ezaki et al., J. Biol. Chem. (J Biol 
Chem.1999 Feb 19;274 (8) : 5078-82) ) . 
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From the increase in carbon dioxide in the air^ 
. environmental problems such as green house effects have 
occurred. As a solution thereto, ribulose phosphate 
carboxylase catalyzing carbon dioxide fixation is noted. 
5 The ratio of oxygen versus carbon dioxide in the air is about 
20:0. 03, and oxygen is much more abundant than carbon dioxide. 
Accordingly, for the purpose of the above, a high specificity 
against carboxylase reaction, that is greater x value, is 
required. The enzymes from KOD-1 strain have higher x 
10 values than those of conventional type II enzymes (about 
30-200X) or those of type I enzymes (about lOX) , and thus 
are expected to be useful for the application of more 
efficient carbon dioxide fixation. 

15 (Example 12: fructose 1, 6-bisphophate aldolase) 

In order to express the fructose 1, 6-bisphophate 
aldolase (SEQ ID NO: 1275) encoded by an open reading frame 
obtained by the present invention in Escherichia coli, the 
following operations were performed: fragments containing 

20 the open reading frames was amplified by PGR technology and 
inserted to plasmid pET21a(+) (Novagen) to yield an 
expression plasmid. This plasmid was used to transform the 
Escherichia coli BLl (DEB) strain. 

25 The resultant ampicillin resistant transf ormants 

were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04* 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 

30 added thereto and culture was continued at 37 °C for four 
hours. After culture, cells were collected by 

centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
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at 80 °C for fifteen minutes, and then centrifuged to yield 
. the supernatant thereof, which was used as a crude enzyme 
solution. 

5 The crude enzyme solution was measured according to 

KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solutions have the fructose 
1, 6-bisphophate aldolase activity of interest. Further, 
10 the enzyme has an optimum temperature of 90 °C. 

(Example 13: glycerol kinase) 

In order to express the glycerol kinase (SEQ ID NO: 
1646) encoded by an open reading frame obtained by the 

15 present invention, in Escherichia coli, the following 
operations were performed: fragments containing the open 
reading frames was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield expression 
plasmids. This plasmid was used to transform the 

20 Escherichia coli BLl (DE3) strain. 

The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04^ 

25 7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 

30 yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as crude enzyme 
solutions . 
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The crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA^ published by Asakura shoten (1982), to 
5 confirm that the crude enzyme solutions have the enzymatic 
activity of interest. Further, the enzyme has an optimum 
temperature at 90 ^'C. 

(Example 14: glutamate dehydrogenase) 
10 In order to express the glutamate dehydeogenases (SEQ 

ID NO: 1239 and 1637) encoded by an open reading frame 
obtained by the present invention, in Escherichia coli, the 
following operations were performed: fragments containing 
the open reading frames was amplified by PGR technology and 
15 inserted to plasmid pET21a ( + ) (Novagen) to yield expression 
plasmids. These plasmids were used to transform the 
Escherichia coli BLl (DE3) strain. 

The resultant ampicillin resistant transf ormants 
20 were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04* 
7H20(pH7)), cultured at 37 until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
25 four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which were used as crude enzyme 
30 solutions. 

These crude enzyme solutions were measured according 
to KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji 
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MARUO, andNobuo TAMIYA, published by Asakura shoten (1982)^ 
to confirm that the crude enzyme solutions have the enzymatic 
activity of interest. Further, these enzymes have an 
optimuin temperature at 90 °C. 

5 

(Example 15: pyruvate kinase) 

In order to express the pyruvate kinase (SEQ ID NO: 
1776) encoded by an open reading frame obtained by the 
present invention, in Escherichia coli, the following 
10 operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid. This plasmid was used to transform the Escherichia 
coli BLl (DE3) strain. 

15 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 

20 Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract- The resultant cell extract was heated 

25 at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution. 

This crude enzyme solution was measured according to 
30 KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest- Further, this enzyme has an optimum 
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temperature at 90 ®C. 

(Example 16: enolase) 

In order to express the enolase (SEQ ID NO: 681) encoded 
5 by an open reading frame obtained by the present invention^ 
in Escherichia coli, the following operations were 
performed: a fragment containing the open reading frame was 
amplified by PGR technology and inserted into plasmid 
pET21a(+) (Novagen) to yield an expression plasmid. This 
10 plasmid was used to transform the Escherichia coli BLl (DE3) 
strain . 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 

15 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20{pH7)), cultured at 37 ^'C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 

20 centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 ""C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution. 

25 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
30 activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

(Example 17: fructose 1, 6-bisphophatase) 
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In order to express the fructose 1, 6-bisphophatase 
(SEQ ID NO: 1488) encoded by an open reading frame obtained 
by the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
5 reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid- This plasmid was used to transform the Escherichia 
coli BLl (DE3) strain. 

10 The resultant ampicillin resistant transformant was 

inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20{pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 

15 added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 

20 the supernatant thereof, which was used as a crude enzyme 
solution. 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
25 and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

30 (Example 18: hydrogenase) 

In order to express the hydrogenase (each subunits 
correspond to SEQ IDNO:1141, 1142, 1502, and 1503) encoded 
by an open reading frames obtained by the present invention. 
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in Escherichia coli, the following operations were 
performed: fragments containing the open reading frames 
were amplified by PGR technology and inserted into plasmid 
pET21a(+) (Novagen) to yield expression plasmids. These 
5 plasmids were used to transform the Escherichia coli BLl 
(DE3) strains. 

The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 

10 NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 

15 centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extracts were 
heated at 80 "^C for fifteen minutes, and then centrifuged 
to yield the supernatants thereof, which were used as crude 
enzyme solutions. 

20 

The crude enzyme solutions were measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
25 activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

(Example 19: p-glycosidase) 

In order to express the p-glycosidase (SEQ ID NO: 990) 
30 encoded by an open reading frame obtained by the present 
invention, in Escherichia coli, the following operations 
were performed: a fragment containing the open reading 
frame was amplified by PGR technology and inserted into 



416 



KJ002 



plasmid pET21a ( + ) (Novagen) to yield an expression plasmid. 
This plasmid was used to transform the Escherichia coli BLl 
(DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl~p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution. 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
20 and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

25 (Example 20: a-amylase) 

In order to express the a-amylase (SEQ ID NO: 268) 
encoded by an open reading frame obtained by the present 
invention, in Escherichia coli, the following operations 
were performed: a fragment containing the open reading 

30 frame was amplified by PGR technology and inserted into 
plasmid pET21a ( + ) (Novagen) to yield an expression plasmid. 
This plasmid was used to transform the Escherichia coli BLl 
(DE3) strain. 
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The resultant ampicillin resistant transf ormant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
5 7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrifugation, broken by sonication, and centrifuged to 
10 yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution. 

15 This crude enzyme solution was measured according to 

KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 

20 temperature at 90 °C. 

(Example 21: deacetylase) 

In order to express the deacetylase (SEQ ID NO: 1190) 
encoded by an open reading frame obtained by the present 

25 invention, in Escherichia coli, the following operations 
were performed: a fragment containing the open reading 
frame was amplified by PGR technology and inserted into 
plasmid pET21a ( + ) (Novagen) to yield an expression plasmid. 
This plasmid was used to transform the Escherichia coli BLl 

30 (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
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0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p~D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
5 four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
10 solution. 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
15 confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 ®C. 

(Example 22: cyclodextrin glucanotransf rase) 
20 In order to express the cyclodextrin 

glucanotransf rase (SEQ ID NO: 1068) encoded by an open 
reading frame obtained by the present invention, in 
Escherichia coli, the following operations were performed: 
a fragment containing the open reading frame was amplified 
25 by PGR technology and inserted into plasmid pET21a(+) 
(Novagen) to yield an expression plasmid. This plasmid was 
used to transform the Escherichia coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
30 inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
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added thereto and the culture was continued at 37 °C for 
four hours. After culture^ cells were collected by 
centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
5 at 80 ''C for fifteen minutes^ and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution . 

This crude enzyme solution was measured according to 
10 KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 ®C. 

15 

(Example 23: 4-a-D-glucanotransf erase) 
In order to express the 4-a-D-glucanotransf erase (SEQ 
ID NO: 1185) encoded by an open reading frame obtained by 
the present invention, in Escherichia coli, the following 
20 operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid- This plasmid was used to transform the Escherichia 
coli BLl (DE3) strain. 

25 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODgeo reached 0-. 4. 
30 Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrifugation, broken by sonication, and centrifuged to 
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yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes^ and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution. 

5 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
10 activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

(Example 24: DNA polymerases) 

In order to express the DNA polymerases (SEQ ID NO: 2, 
15 93, 379, 648, 649, 743, 1386, 1740 and 1830) encoded by open 
reading frames obtained by the present invention, in 
Escherichia coli, the following operations were performed: 
fragments containing the open reading frames were amplified 
by PGR technology and inserted into plasmid pET21a(+) 
20 (Novagen) to yield expression plasmids. These plasmids 
were used to transform the Escherichia coli BLl (DE3) 
strains. 

The resultant ampicillin resistant transf ormants 
25 were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04- 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and culture was continued at 37 °C for four 
30 hours. After culture, cells were collected by 

centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extracts were 
heated at 80 °C for fifteen minutes, and then centrifuged 
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to yield the supernatants thereof^ which were used as crude 
enzyme solutions. 

These crude enzyme solutions were measured according 
5 to KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji 
MARUO, and Nobuo TAMIYA, published by Asakura shoten (1982)^ 
to confirm that the crude enzyme solution has the enzymatic 
activity of interest for the respective sequences. Further, 
this enzyme has an optimum temperature at 90 °C for the 
10 respective sequences. 

(Example 25: homing endonuclease) 

In order to express the homing endonuclease (SEQ ID 
NO: 2) encoded by an open reading frame obtained by the 

15 present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid. This plasmid was used to transform the Escherichia 

20 coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 

25 7H20(pH7)), cultured at 37 "^C until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and culture was continued at 37 °C for four 
hours. After culture, cells were collected by 

centrif ugation, broken by sonication, and centrifuged to 

30 yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution. 
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This crude enzyme solution was measured by a modified 
method of endonuclease assay according KOSOGAKU HANDOBUKKU 
(Enzyme handbook) edited by Bunji MARUO, and Nobuo TAMIYA, 
5 published by Asakura shoten (1982) , to confirm that the crude 
enzyme solution has the enzymatic activity of interest. 
Further^ this enzyme has an optimum temperature at 90 ^'C. 

(Example 26: histones) 

10 In order to express the histones (SEQ ID NO: 173, 1470 

and 1963 and the like) encoded by an open reading frame 
obtained by the present invention, in Escherichia coli, the 
following operations were performed: a fragment containing 
the open reading frame was amplified by PGR technology and 

15 inserted into plasmid pET21a(+) (Novagen) to yield an 
expression plasmid. This plasmid was used to transform the 
Escherichia coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
20 inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
25 four hours. After culture, cells were collected by 
centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude protein 
30 solution. 

This crude protein solution was measured by a method 
using histone kinase as described in KOSOGAKU HANDOBUKKU 
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(Enzyme handbook) edited by Bunji MARUO, and Nobuo TAMIYA, 
published by Asakura shoten (1982), to confirm that the crude 
protein solutions hasve an activity as a substrate for the 
activity of interest. Further, this protein was stable at 
5 90 °C. 

(Example 27: histones A&B) 

In order to express the histones A and B (SEQ ID NO: 
1470 and 1962) encoded by open reading frames obtained by 

10 the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(-i-) (Novagen) to yield an expression 
plasmid. This plasmid was used to transform the Escherichia 

15 coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 

20 7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 

25 yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatants thereof, which were used as crude protein 
solutions. 

30 These crude protein solutions were measured by a 

method using histone kinase as described in KOSOGAKU 
HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, and 
Nobuo TTUyilYA, published by Asakura shoten (1982), to confirm 
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that the crude protein solutions have an activity as a 
substrate for the activity of interest. Further, these 
proteins were stable at 90 ^'C. 

5 (Example 28: Rec protein) 

In order to express the Rec protein (SEQ ID NO: 1106) 
encoded by an open reading frame obtained by the present 
invention, in Escherichia coli, the following operations 
were performed: a fragment containing the open reading 
10 frame was amplified by PGR technology and inserted into 
plasmid pET21a ( + ) (Novagen) to yield an expression plasmid. 
This plasmid was used to transform the Escherichia coli BLl 
(DE3) strain. 

15 The resultant ampicillin resistant transformant was 

inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0-5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 **C until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 

20 added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 "^C for fifteen minutes, and then centrifuged to yield 

25 the supernatant thereof, which was used as a crude protein 
solution. 

This crude protein solution was measured according to 
Methods in Enzymology 262 (1995) to confirm that the crude 
30 protein solution has an activity of the Rec protein. 
Further, this protein was stable at 90 °C. 

(Example 29: 0^-methylguanine DNA methyl transferase) 
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In order to express the 0^-methylguanine DNA methyl 
transferase (SEQ ID NO: 1034 ) encoded by an open reading frame 
obtained by the present invention, in Escherichia coli, the 
following operations were performed: a fragment containing 
5 the open reading frame was amplified by PGR technology and 
inserted into plasmid pET21a(+) (Novagen) to yield an 
expression plasmid. This plasmid was used to transform the 
Escherichia coli BLl (DE3) strain. 

10 The resultant ampicillin resistant transformant was 

inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 

15 added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 for fifteen minutes, and then centrifuged to yield 

20 the supernatant thereof, which was used as a crude enzyme 
solution. 

This crude enzyme solution was measured according to 
Methods in Enzymology 262 (1995) to confirm that the crude 
25 enzyme solution has the enzymatic activity of interest. 
Further, this enzyme has an optimum temperature at 90 °C. 

(Example 30: PCNA) 

In order to express the PCNA (Proliferating Cell Nuclear 
30 Antigen) (SEQ ID NO: 93) encoded by an open reading frame 
obtained by the present invention, in Escherichia coli, the 
following operations were performed: a fragment containing 
the open reading frame was amplified by PGR technology and 
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inserted into plasinid pET21a( + ) (Novagen) to yield an 
expression plasmid. This plasmid was used to transform the 
Escherichia coli BLl (DE3) strain. 

5 The resultant ampicillin resistant transf ormant was 

inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p~D-thiogalactopyranoside (IPTG, O.lmM) was then 

10 added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 

15 the supernatant thereof, which was used as a crude protein 
solution. 

This crude protein solution was measured according to 
Methods in Enzymology 262 (1995) to confirm that the crude 
20 protein solution has the activity of the PCNA protein. 
Further, this protein was stable at 90 ^"0. 

(Example 31: indole pyruvate ferredoxin 
oxydoreductases ) 

25 In order to express the indole pyruvate ferredoxin 

oxydoreductases (SEQIDNOs: ) encoded by open reading frames 
obtained by the present invention, in Escherichia coli, the 
following operations were performed: fragments containing 
the open reading frames were amplified by PGR technology 

30 and inserted into plasmid pET21a(+) (Novagen) to yield 
expression plasmids. These plasmids were used to transform 
Escherichia coli BLl (DE3) strains. 
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The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04* 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
5 Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 ®C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extracts were 
10 heated at 80 °C for fifteen minutes, and then centrifuged 
to yield the supernatants thereof, which were used as crude 
enzyme solutions. 

These crude enzyme solutions were measured according 
15 to KOSOGAKU HANDOBUKKU (Enzyme handbook), edited by Bun j i 
MARUO and Nobuo TAMIYA, published by Asakura shoten (1982) , 
to confirm that the crude enzyme solutions have the enzymatic 
activity of interest for the respective sequences. Further, 
these enzymes have an optimum temperature at 90 for the 
20 respective sequences. 

(Example 32: glutamine synthase) 

In order to express the glutamine synthase (SEQ ID 
NO: 627) encoded by an open reading frame obtained by the 

25 present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid. This plasmid was used to transform the Escherichia 

30 coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
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0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 ^'C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
5 four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
10 solution. 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
15 confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

20 (Example 33: anthranilate phosphoribosyl 

transferases) 

In order to express the anthranilate phosphoribosyl 
transferases (SEQ ID NO: 394 and 17 67) encoded by an open 
reading frame obtained by the present invention, in 

25 Escherichia coli, the following operations were performed: 
a fragment containing the open reading frame was amplified 
by PGR technology and inserted into plasmid pET21a(+) 
(Novagen) to yield an expression plasmid. This plasmid was 
used to transform the Escherichia coli BLl (DE3) strain. 

30 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
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7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 

Isopropyl-p-D-thiogalactopyranoside (IPTG^ O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
5 centirif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatants thereof, which were used as crude enzyme 
solutions . 

10 

The crude enzyme solutions were measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solutions have the enzymatic 
15 activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

(Example 34: cobyric acid synthase) 

In order to express the cobyric acid synthases (SEQ 
20 ID NO: 137 and 1904) encoded by an open reading frame obtained 
by the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
25 plasmid. This plasmid was used to transform the Escherichia 
coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
30 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 "^C until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
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four hours- After culture, cells were collected by 
centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
5 the supernatants thereof, which were used as crude enzyme 
solutions . 

The crude enzyme solutions were measured according to 
Methods in Enzymology , Acadmic Press, to confirm that the 
10 crude enzyme solutions have the enzymatic activity of 
interest. Further, this enzyme has an optimum temperature 
of 90 °C. 

(Example 35: phosphoribosyl anthranilate isomerase) 
15 In order to express the phosphoribosyl anthranilate 

isomerase (SEQ ID NO: 44) encoded by an open reading frame 
obtained by the present invention, in Escherichia coli, the 
following operations were performed: a fragment containing 
the open reading frame was amplified by PGR technology and 
20 inserted into plasmid pET21a(+) (Novagen) to yield an 
expression plasmid. This plasmid was used to transform the 
Escherichia coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
25 inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20{pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
30 four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
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the supernatant thereof, which was used as a crude enzyme 
solution. 

This crude enzyme solution was measured according to 
5 KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature of 90 °C. 

10 

(Example 36: cobalamin synthase) 

In order to express the cobalamin synthase (SEQ ID 
NO: 181, 910, 1720 and 1973) encoded by open reading frames 
obtained by the present invention, in Escherichia coli, the 
15 following operations were performed: a fragment containing 
the open reading frame was amplified by PGR technology and 
inserted into plasmid pET21a(+) (Novagen) to yield an 
expression plasmid. This plasmid was used to transform the 
Escherichia coli BLl (DE3) strain. 

20 

The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04* 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 

25 Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 

30 at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatants thereof, which were used as crude enzyme 
solutions . 
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The crude enzyme solutions were measured according to 
Methods in Enzymology^ Acadmic Press, to confirm that the 
crude enzymes solutions have the enzymatic activity of 
interest. Further, these enzymes have an optimum 
5 temperature of 90 

(Example 37: indole-3-glycerole-phophate synthase) 
In order to express the indole-3~glycerole-phophate 
synthase (SEQ ID NO: 772) encoded by an open reading frame 
10 obtained by the present invention, in Escherichia coli, the 
following operations were performed: a fragment containing 
the open reading frame was amplified by PGR technology and 
inserted into plasmid pET21a(+) (Novagen) to yield an 
expression plasmid. This plasmid was used to transform the 
15 Escherichia coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 

20 7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 

25 yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution . 

30 This crude enzyme solution was measured according to 

KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
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activity of interest. Further, this enzyme has an optimum 
temperature of 90 °C. 

(Example 38: tryptophane synthase) 
5 In order to express the tryptophane synthase (SEQ ID 

NO: 395, 774, 954 and 2032) encoded by open reading frames 
obtained by the present invention, in Escherichia coli, the 
following operations were performed: a fragment containing 
the open reading frame was amplified by PGR technology and 
10 inserted into plasmid pET21a(+) (Novagen) to yield an 
expression plasmid- This plasmid was used to transform the 
Escherichia coli BLl (DE3) strain. 

The resultant ampicillin resistant transf ormants 
15 were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04* 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 "^C for 
20 four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatants thereof, which were used as crude enzyme 
25 solutions. 

The crude enzyme solutions were measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
30 confirm that the crude enzyme solutions have the enzymatic 
activity of interest. Further, these enzymes have an 
optimum temperature at 90 
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(Example 39: ribose phosphate pyrophosphokinase) 
In order to express the ribose phosphate 
pyrophosphokinase (SEQIDNO: 701) encoded by an open reading 
frame obtained by the present invention, in Escherichia coli, 
5 the following operations were performed: a fragment 
containing the open reading frame was amplified by PGR 
technology and inserted into plasmid pET21a(+) (Novagen) 
to yield an expression plasmid- This plasmid was used to 
transform the Escherichia coli BLl (DE3) strain. 

10 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 

15 Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 

20 at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution. 



This crude enzyme solution was measured according to 
25 KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 ^'C. 

30 

(Example 40: glutamate synthase) 

In order to express the glutamate synthase (SEQ ID NO: 
1578) encoded by an open reading frame obtained by the 
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present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
5 plasmid. This plasmid was used to transform the Escherichia 
coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution . 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

(Example 41: Orotidine-5 ' -phosphate decarboxylase) 
In order to express the orotidine-5 ' -phosphate 
30 decarboxylase (SEQ ID NO: 1096) encoded by an open reading 
frame obtained by the present invention, in Escherichia coli, 
the following operations were performed: a fragment 
containing the open reading frame was amplified by PGR 



15 



20 
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technology and inserted into plasmid pET21a(+) (Novagen) 
to yield an expression plasmid. This plasmid was used to 
transform the Escherichia coli BLl (DE3) strain. 

5 The resultant ampicillin resistant transformant was 

inoculated on to the NZCYM medium (1 % NZ amine^ 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 ""C until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 

10 added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 **C for fifteen minutes, and then centrifuged to yield 

15 the supernatant thereof, which was used as a crude enzyme 
solution. 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun ji MARUO, 
20 and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 ®C. 

25 (Example 42: anthranilate synthase) 

In order to express the anthranilate synthase (SEQ ID 
NO: 43 and 773) encoded by open reading frames obtained by 
the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 

30 reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid. These plasmids were used to transform the 
Escherichia coli BLl (DE3) strain. 
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The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine ^ 0.5 % 
NaCl^ 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04* 
5 7H20(pH7)), cultured at 37 until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
10 yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatants thereof, which were used as crude enzyme 
solutions - 

15 The crude enzyme solutions were measured according to 

KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solutions have the enzymatic 
activity of interest. Further, these enzymes have an 

20 optimum temperature at 90 °C. 

(Example 43: aspartyl-tRNA synthase) 

In order to express the aspartyl-tRNA synthase (SEQ 
ID NO: 808) encoded by an open reading frame obtained by 

25 the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid. This plasmid was used to transform the Escherichia 

30 coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
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0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.litiM) was then 
added thereto and the culture was continued at 37 ®C for 
5 four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
10 solution. 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
15 confirm that the crude enzyme solution has the enzymatic 
activity of interest- Further, this enzyme has an optimum 
temperature at 90 °C. 

(Example 44 : phenylalanyl-tRNA-synthase) 
20 In order to express the phenylalanyl-tRNA-synthase 

(SEQ ID NO: 506 and 878) encoded by open reading frames 
obtained by the present invention, in Escherichia coli, the 
following operations were performed: a fragment containing 
the open reading frame was amplified by PGR technology and 
25 inserted into plasmid pET21a(+) (Novagen) to yield an 
expression plasmid. These plasmids were used to transform 
the Escherichia coli BLl (DE3) strain. 

The resultant ampicillin resistant transf ormants 
30 were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04* 
7H20(pH7)), cultured at 37* until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
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added thereto and the culture was continued at 37 '"C for 
four hours. After culture, cells were collected by 
centrifugation, broken by sonication, and centrifuged to 
yield a cell extract- The resultant cell extract was heated 
5 at 80 "^C for fifteen minutes, and then centrifuged to yield 
the supernatants thereof, which were used as crude enzyme 
solutions . 

The crude enzyme solutions were measured according to 
10 KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solutions have the enzymatic 
activity of interest. Further, these enzyme has an optimum 
temperature at 90 °C. 

15 

( Example 4 5 : chaperonins ) 

In order to express the chaperonin A (SEQ ID NO: 1368) 
and the chaperonin B (SEQ ID NO: 721) encoded by open reading 
frames obtained by the present invention, in Escherichia 
20 coli, the following operations were performed: a fragment 
containing the open reading frames were amplified by PGR 
technology and inserted into plasmid pET21a(+) (Novagen) 
to yield an expression plasmid. These plasmids were used 
to transform the Escherichia coli BLl (DE3) strain. 

25 

The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04- 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
30 Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
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yield a cell extract. The resultant cell extract was heated 
at 80 **C for fifteen minutes, and then centrifuged to yield 
the supernatants thereof, which were used as crude protein 
solutions . 

5 

These crude protein solutions were measured by a 
method described in Frydman, J. et al. (1994 ) Nature 370, 111., 
to confirm that the crude protein solutions have activity 
as a substrate for the enzyme of interest. Further, these 
10 proteins were stable at 90 °C. 

(Example 46: TATA binding protein) 

In order to express the TATA binding protein (SEQ ID 
15 NO: 31) encoded by an open reading frame obtained by the 
present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
20 plasmid. This plasmid was used to transform the Escherichia 
coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 

25 0-5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 ''C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 

30 centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude protein 
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solution. 

This crude protein solution was measured according to 
Methods in Enzymology, Academic Press ^ to confirm that the 
5 crude protein solution has the activity of the protein. 
Further, this protein was stable at 90 °C. 

(Example 47: TBP-interacting protein) 
In order to express the TBP-interacting protein (SEQ 
10 ID NO: 1289) encoded by an open reading frame obtained by 
the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
15 plasmid. This.plasmid was used to transform the Escherichia 
coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 

20 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 for 
four hours. After culture, cells were collected by 

25 centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude protein 
solution. 

30 

This crude protein solution was measured according to 
Methods in Enzymology, Academic Press, to confirm that the 
crude protein solution has the activity of the protein. 
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Further, this protein was stable at 90 ^'C. 
(Example 48: RNase HIT) 

In order to express the RNase HII (SEQ ID NO: 856) 
5 encoded by an open reading frame obtained by the present 
invention, in Escherichia coli, the following operations 
were performed: a fragment containing the open reading 
frame was amplified by PGR technology and inserted into 
plasmid pET21a ( + ) (Novagen) to yield an expression plasmid. 
10 This plasmid was used to transform the Escherichia coli BLl 
(DE3) strain.. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 

15 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D~thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 *'C for 
four hours. After culture, cells were collected by 

20 centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
solution. 

25 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
30 activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 
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(Example 49: hydrogenase maturation factor) 
In order to express the hydrogenase maturation factors 
(SEQ ID NO: 1144, 1154, 1156, 1516, 1518, 1519, 1869 and 
1871) encoded by open reading frames obtained by the present 
5 invention, in Escherichia coli, the following operations 
were performed: a fragment containing the open reading 
frames were amplified by PGR technology and inserted into 
plasmid pET21a ( + ) (Novagen) to yield an expression plasmid. 
These plasmids were used to transform the Escherichia coli 
10 BLl (DE3) strain. 

The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 

15 7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 ^^C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 

20 yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatants thereof, which were used as crude protein 
solutions . 

25 This crude protein solutions were measured according 

to KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji 
MARUO, and Nobuo TAMIYA, published by Asakura shoten (1982), 
to confirm that the crude protein solutions have activity 
as substrates for the enzyme of interest. Further, these 

30 proteins were stable at 90 °C. 



(Example 50: Lon protease) 

In order to express the Lon protease (SEQ ID NO: 929) 
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encoded by an open reading frame obtained by the present 
invention, in Escherichia coli, the following operations 
were performed: a fragment containing the open reading 
frame was amplified by PGR technology and inserted into 
5 plasmid pET21a ( + ) (Novagen) to yield an expression plasmid. 
This plasmid was used to transform the Escherichia coli BLl 
(DE3) strain. 

The resultant ampicillin resistant transformant was 
10 inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
15 four hours. After culture, cells were collected by 
centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 ®C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
20 solution. 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
25 confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 "^C. 

(Example 51: thiol protease) 
30 In order to express the thiol protease encoded by an 

open reading frame obtained by the present invention, in 
Escherichia coli, the following operations were performed: 
a fragment containing the open reading frame was amplified 
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by PGR technology and inserted into plasmid pET21a(+) 
(Novagen) to yield an expression plasmid. This plasmid was 
used to transform the Escherichia coli BLl (DE3) strain. 

5 The resultant ampicillin resistant transformant was 

inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 ^^C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 

10 added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 

15 the supernatant thereof, which was used as a crude enzyme 
solution. 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
20 and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

25 (Example 52: fragellins) 

In order to express the fragellins (SEQIDNO: 11, 350, 
351, 727, and 728) encoded by open reading frames obtained 
by the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 

30 reading frames were amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid. These plasmids were used to transform the 
Escherichia coli BLl (DE3) strain. 
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The resultant ampicillin resistant transf ormants 
were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0-5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04* 
5 7H20(pH7)), cultured at 37 ^^C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
10 yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatants thereof, which were used as crude protein 
solutions . 

15 This crude protein solutions were measured according 

to Aldridge P, Hughes KT.,Curr Opin Microbiol. 2002 
Apr; 5 (2) : 160-5 and the references cited therein, to confirm 
that the crude protein solutions have activity as a substrate 
for the protein of interest. Further, these proteins were 

20 stable at 90 ^'C. 

(Example 53: subtilin-like protease) 

In order to express the subtilin-like protease (SEQ 
ID NO: 97 9) encoded by an open reading frame obtained by 

25 the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
into plasmid pET21a{+) (Novagen) to yield an expression 
plasmid. This plasmid was used to transform the Escherichia 

30 coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
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0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
5 four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude enzyme 
10 solution. 

This crude enzyme solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji MARUO, 
and Nobuo TAMIYA, published by Asakura shoten (1982), to 
15 confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
temperature at 90 °C. 

(Example 54: cell division control protein A) 
20 In. order to express the cell division control protein 

A (SEQ ID NO: 1369) encoded by an open reading frame obtained 
by the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 
reading frame was amplified by PGR technology and inserted 
25 into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid. This plasmid was used to transform the Escherichia 
coli BLl (DE3) strain. 

The resultant ampicillin resistant transformant was 
30 inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
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added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
5 at 80 '"C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude protein 
solution. 

This crude protein solution was measured for cell 
10 division controlling activity, to confirm that the crude 
protein solution has the activity of the protein of interest. 
Further, this protein was stable at 90 °C. 

(Example 55: endonucleases) 

15 In order to express the endonucleases (SEQIDNOs: 547, 

697, 900, 1450, 1702, 1716, 1731, and 2010) encoded by open 
reading frames obtained by the present invention, in 
Escherichia coli, the following operations were performed: 
fragments containing the open reading frames were amplified 

20 by PGR technology and inserted into plasmid pET21a(+) 
(Novagen) to yield expression plasmids. These plasmids 
were used to transform the Escherichia coli BLl (DE3) 
strains . 

25 The resultant ampicillin resistant transf ormants 

were inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % 
NaCl, 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04' 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p~D-thiogalactopyranoside (IPTG, O.lmM) was then 

30 added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extracts were 
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heated at 80 "^C for fifteen minutes, and then centrifuged 
to yield the supernatants thereof, which were used as crude 
enzyme solutions. 

5 These crude enzyme solutions were measured according 

to KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bunji 
MARUO, andNobuo TAMIYA, published by Asakura shoten (1982) , 
to confirm that the crude enzyme solutions have the enzymatic 
activity of interest for the respective sequences. Further, 
10 these enzymes have an optimum temperature at 90 °C for the 
respective sequences - 

(Example 56: ferredoxin) 

In order to express the ferredoxin (SEQ ID NO: 253) 
15 encoded by an open reading frame obtained by the present 
invention, in Escherichia coli, the following operations 
were performed: a fragment containing the open reading 
frame was amplified by PGR technology and inserted into 
plasmid pET21a ( + ) (Novagen) to yield an expression plasmid. 
20 This plasmid was used to transform the Escherichia coli BLl 
(DE3) strain. 

The resultant ampicillin resistant transformant was 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 

25 0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 ' 
7H20(pH7)), cultured at 37 °C until the ODgeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 
added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 

30 centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 
the supernatant thereof, which was used as a crude protein 
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solution. 

This crude protein solution was measured according to 
KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
5 and Nobuo TAMIYA, published by Asakura shoten (1982), to 
confirm that the crude protein solution has the activity 
of the protein of interest. Further, this protein was 
stable at 90 °C. 

10 (Example 57: exo-p-D-glucosaminidase) 

In order to express the exo~p-D-glucosaminidase (SEQ 
ID NO: 1902) encoded by an open reading frame obtained by 
the present invention, in Escherichia coli, the following 
operations were performed: a fragment containing the open 

15 reading frame was amplified by PGR technology and inserted 
into plasmid pET21a(+) (Novagen) to yield an expression 
plasmid. This plasmid was used to transform the Escherichia 
coli BLl (DE3) strain. 

20 The resultant ampicillin resistant transformant was 

inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 
Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) was then 

25 added thereto and the culture was continued at 37 °C for 
four hours. After culture, cells were collected by 
centrif ugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extract was heated 
at 80 °C for fifteen minutes, and then centrifuged to yield 

30 the supernatant thereof, which was used as a crude enzyme 
solution. 

This crude enzyme solution was measured according to 
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KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i MARUO, 
and Nobuo TAMIYA^ published by Asakura shoten (1982), to 
confirm that the crude enzyme solution has the enzymatic 
activity of interest. Further, this enzyme has an optimum 
5 temperature at 90 °C. 

(Example 58: confirmation of other deduced functions) 
In order to express the gene products encoded by open 
reading frames obtained by the present invention, in 
10 Escherichia coli, the following operations are performed: 
fragments containing the open reading frames are amplified 
by PGR technology and inserted into plasmid pET21a(+) 
(Novagen) to yield expression plasmids. These plasmids are 
used to transform the Escherichia coli BLl (DE3) strains. 

15 

The resultant ampicillin resistant transf ormants are 
inoculated on to the NZCYM medium (1 % NZ amine, 0.5 % NaCl, 
0.5 % yeast extract, 0.1 % casamino acid, 0.2 % MgS04 • 
7H20(pH7)), cultured at 37 °C until the ODeeo reached 0.4. 

20 Isopropyl-p-D-thiogalactopyranoside (IPTG, O.lmM) is then 
added thereto and the culture is continued at 37 ®C for four 
hours. After culture, cells are collected by 

centrifugation, broken by sonication, and centrifuged to 
yield a cell extract. The resultant cell extracts are 

25 heated at 80 °C for fifteen minutes, and then centrifuged 
to yield the supernatants thereof, which are used as crude 
enzyme solutions. 

These crude enzyme solutions are measured according 
30 to KOSOGAKU HANDOBUKKU (Enzyme handbook) edited by Bun j i 
MARUO, and Nobuo TAMIYA, published by Asakura shoten (1982), 
to confirm that the crude enzyme solution has the activity 
of interest for the respective sequences. Further, this 
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enzyme has an optimum temperature or is stable at 90 °C for 
the respective sequences. 

(EXAMPLE 59: biomolecule chip - DNA chip) 
5 Next, an exemplary prepration of a biomolecule chip 

is demonstrated. In this Example, methods for DNAs having 
different sequences being aligned and immobilized thereon 
are described. 

10 Aggregates of DNA fragments having specific sequences 

of the present invention are immobilized in a DNA spot form 
on a substrate. As a substrate, glass is usually used but 
plastic may also be used. Formats for DNA chips may be 
rectangular or circular. Each DNA dot comprises a DNA 

15 encoding a different gene of the present invention, and is 
immobilized onto the substrate. The size of the DNA dot is 
100-200 /xm in diameter in case of microarrays, and in the 
case of a DNA chip, about 10-30 /zm. 

20 Next, methods for forming each DNA spot are described. 

For example, a DNA solution of interest is located onto a 
DNA substrate using pin methods, Inkjet format and the like. 

Exemplary preparation of such DNA chips prepared 
25 thereby is shown in Figure 7. 

(Example 60: Biomolecule chip - Protein Chip) 
Next, an exemplary preparation of biomolecule chips 
is demonstrated. In this Example, methods for aligning 
30 proteins having different sequences on a substrate and 
immobilized thereto, are described. 

Aggregates of the protein fragments of specific 
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sequences of the present invention are immobilized on a 
substrate in a form of a dot. Glass is usually used as a 
substrate^ but plastic may also be used. Formats may be 
rectangular^ as with a DNA chip, or circular. Each protein 
5 dot comprises a protein from a different gene of the present 
invention and is immobilized onto the substrate. The size 
of the protein dot is 100-200 //m in diameter in case of 
microarrays, and in the case of DNA chip, about 10-30 jum. 

10 Next, methods for forming each protein spot are 

described. For example, the protein solution of interest 
is located onto a protein substrate using pin methods, Inkjet 
format and the like. 

15 Exemplary preparation of such protein chips prepared 

thereby is shown in Figure 7. Outlooks thereof are similar 
to that of DNA chip. 

Although certain preferred embodiments have been 
20 described herein, it is not intended that such embodiments 
be construed as limitations on the scope of the invention 
except as set forth in the appended claims. Various other 
modifications and equivalents will be apparent to and can 
be readily made by those skilled in the art, after reading 
25 the description herein, without departing from the scope 
and spirit of this invention. All patents, published patent 
applications and publications cited herein are incorporated 
by reference as if set forth fully herein. 

30 (Effects of the invention) 

The present invention provides a method and kit for 
gene targeting in an efficient and accurate manner at any 
position in the genome of an organism. Further, information 
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of the entire genomic sequence of Thermococcus 
kodakaraensis KODl, and the gene information contained 
therein are also provided. 

5 INDUSTRIAL APPLICABILITY 

The present invention provides a variety of 
hyperthermostable gene products^ and thus is useful in 
providing a method and kit for gene targeting in an efficient 
10 and accurate manner at any position in the genome of an 
organism. Such a variety of hyperthermostable gene 
products are applicable to global analysis of a 
hyperthermostable organism in genomic analysis and the 
like. 



